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Many  experiments  are  designed  to  take  measurements 
on  the  decom.position  of  one  particular  entity  into  several 
parts.   The  effects  of  other  variables  in  the  experiment 
on  this  decomposition  are  then  noted  and  analyzed.   This 
dissertation  is  a  first  step  toward  the  analysis  of  such 
a  multivariate  response  of  continuous  proportions.   Two 
possible  distributions,  the  Dirichlet  distribution  and 
a  generalization  of  the  Dirichlet  distribution,  are 
proposed  as  models  for  this  response  vector.   It  is  then 
desired  to  investigate  the  general  inference  problems 
of  estimation  and  hypothesis  testing  for  these  distri- 
butions. 

However,  before  attempting  to  solve  the  multivariate 
problem,  it  was  thought  best  to  investigate  the  univariate 
case.   A  univariate  response  of  this  type  would  consist 
of  a  single  continuous  proportion,  and  it  is  assumed 
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that  such  a  response  will  follow  a  beta  distribution. 
The  problem  then  is  to  estimate  and  to  conduct  tests 
of  hypothesis  about  the  parameters  and  mean  of  such  a 
beta  distribution. 

Chapter  3  deals  with  methods  of  estimation  for  the 
beta  distribution.   Two  sets  of  estimators,  the  moment 
estimators  and  the  maximum  likelihood  estimators,  are 
given  for  the  parameters  of  the  beta  distribution.   For 
the  mean  of  the  beta  distribution,  a  geometric  mean 
estimator  is  given  in  addition  to  the  moment  and  maximum 
likelihood  estimators.   Comparisons  among  the  estimators 
are  made  in  terros  of  their  biases,  expected  mean  squares, 
and  variances. 

In  Chapter  4  various  tests  of  hypothesis  are 
constructed  for  the  parameters  and  mean  of  the  beta 
distribution.   Both  one-sided  and  two-sided  hypotheses 
are  considered  and  uniformly  most  powerful  or  uniformly 
most  powerful  unbiased  tests  are  given.   To  obtain  the 
critical  values  for  such  tests  two  methods  of  approx- 
imation, a  normal  approximation  and  a  beta  approximation, 
are  given.   Where  it  is  possible,  comparisons  are  made 
between  the  two  methods . 

Finally,  Chapter  5  deals  with  estimation  for  the 
Dirichlet  distribution.   Methods  for  obtaining  both 
the  moment  and  the  maximum  likelihood  estimators  for 
the  parameters  of  the  Dirichlet  distribution  are  given. 
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As  was  noted  earlier,  this  is  just  a  beginning  on  the 
original  problem.   It  remains  to  be  determined  if  the 
properties  of  the  estimators  for  the  parameters  and 
mean  of  the  beta  distribution  also  hold  for  the 
parameters  and  mean  of  the  Dirichlet  distribution. 
Also,  the  problem  of  developing  tests  of  hypothesis  for 
the  Dirichlet  distribution  is  still  to  be  investigated. 
Finally,  the  question  of  the  usefulness  of  the 
generalization  of  the  Dirichlet  distribution  needs  to 
be  answered. 


IX 


CHAPTER  1 
INTRODUCTION 

In  many  experiments  in  various  disciplines,  an 
analysis  is  done  on  the  constituent  parts  of  one  mea- 
surement.  Generally,  these  constituents  are  expressed 
in  terms  of  the  percentage  or  the  proportion  which  each 
one  makes  of  the  entire  measurement.   For  example,  after 
some  treatments  have  been  applied  to  a  set  of  experi- 
mental units,  a  chemical  analysis  might  be  performed 
to  determine  if  the  treatments  have  had  an  effect  on 
the  composition  of  the  material.   From  such  an  experiment 
one  might  measure  the  total  protein  content,  and  then 
divide  that  total  protein  into  three  or  more  types 
of  protein. 

Specifically,  consider  an  experiment  in  v/hich  a 
measurement  is  made  of  the  chemical  composition  of 
shrimp.   Measurements  are  made  of  the  variables 
%  solid,  %  water,  total  extractable  nitrogen,  total 
extractable  protein,  and  many  others.   The  %  solid  is 
further  divided  into  %  fat,  %  protein,  %  ash,  and 
%  carbohydrates.   The  treatments  to  be  applied  to  the 
shrimp  consist  of  two  sets.   Some  of  the  shrimp  will  be 
stored  on  ice  for  periods  of  zero,  seven,  and  fourteen 


days.   The  remaining  shrimp  will  be  divided  into  five 
batches  and  each  batch  v;ill  be  cooked  by  a  different 
cooking  method.   The  experimenter  wishes  to  know  if  the 
two  sets  of  treatments,  the  storing  times  and  the  cooking 
methods,  have  any  effects  on  the  composition  of  the  shrimp. 

At  present,  the  most  frequently  used  method  of 
analyzing  such  data  is  to  consider  each  component  of 
the  chemical  analysis  separately.   Techniques,  such  as 
analysis  of  variance,  are  applied  to  determine  if 
differences  exist  among  the  treatments.   In  some  experi- 
ments, this  component  by  component  analysis  may  be  what 
the  experimenter  wants.   However,  it  may  also  be  that  the 
experimenter  would  like  an  overall  measure  of  differences 
between  the  treatments,  rather  than  the  analysis  for 
each  component  separately.   For  example,  considering 
the  shrimp  data,  the  experimenter  may  wish  to  know  if 
the  breakdown  of  %  solid  into  its  four  constituents 
is  the  same  for  all  the  treatments.   If  the  separate 
analyses  result  in  the  conclusion  that  the  treatments 
are  different  for  some  constituents  but  not  for  others, 
it  may  not  be  clear  whether  the  treatments  are  different 
overall. 

One  possible  solution  to  this  problem  would  be  to 
create  a  vector  of  the  constituents,  assume  the  vector 
has  a  multivariate  normal  distribution,  and  use  well- 
known  results  of  multivariate  analysis  to  analyze  the 
data.   However,  the  assumption  of  multivariate  normality 


may  not  be  appropriate.   In  the  shrimp  data,  the  per- 
centages of  the  four  constituents ,  which  make  up  the 
total  percentage  of  solids,  must  themselves  add  to  the 
total  percentage  of  solids.   If  one  forms  a  vector  of 
the  constituents,  using  either  the  percentages  themselves 
or  the  proportion  that  each  percentage  takes  of  the 
total  percentage  of  solids,  then  the  elements  of  that 
vector  are  subject  to  a  constraint.   Thus,  a  better 
approach  may  be  to  try  to  determine  the  distribution 
of  such  a  vector,  and  through  that  distribution,  make 
inferences  about  the  population  from  which  that  vector 
has  come . 

The  problem,  then,  is  one  of  inference  in  general. 
Although  the  ultimate  goal  is  to  make  inferences  about 
a  vector  of  continuous  proportions  using  the  Dirichlet 
or  generalized  Dirichlet  distribution,  this  dissertation 
deals  for  the  most  part  with  inferences  about  a  single 
continuous  proportion  using  the  beta  distribution. 
Some  results  dealing  with  the  Dirichlet  distribution 
are  given  in  Chapter  5. 


CHAPTER  2 
REVIEW  OF  THE  LITERATURE 


A  distribution  which  naturally  presents  itself  to 
this  problem  is  the  Dirichlet  distribution.   The  deri- 
vation of  the  Dirichlet  distribution  is  straightforward 
and  is  discussed  in  Hogg  and  Craig  (1965) .   Let 
^l'*'*'^k+l  ^^  mutually  stochastically  independent 
random  variables  each  having  a  gamma  distribution  with 
parameters  a^   and  3=1.   The  joint  distribution  of 
X^,..  .  ,y.^_^_j^   is,  then, 

k+1        a.-l  -X. 
4>(X  ,...,x    )  =   n     1   x/   e       0<X.<co.    (2.1) 

1 


Let 


X. 

Y.  =  ^        i  =  1,2,. ..,k    0<Y.<1 

X  +...+X   ,  1 

1      k+1 


\+l   =   V---+^k+l     0<^k+l<"-  (2.2) 


Then 


^1=   Vk+1 \  =  Vk+1 

\+l  =  \+l(l-V^2-----\)-  (2.3) 
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To  obtain  the  density  of  Y-,...,Y,  , ,  the  Jacobian  of 
the  above  transformation  from  the  X-'s  into  the  Y-'s  is 


J  = 


0 


•k+1 
0 


0 
0 

"k+1 


-Y      -Y      -Y 
k+1     k+1     k+1 


Y         Y 
k+1       k 

-Y,  n  1-Y  -...-Y 
k+1     1       k 


By  expanding  the  determinant  about  the  last  column, 
for  k  an  even  integer. 


J  =  Y, 


•k+1 
0 


^k+1 


-Y      -Y      -Y      -Y 
k+1    k+1    k+1    k+1 


-Y, 


k+1 


-Y, 


^k+1 
0 


k+l 


-Y      -Y      -Y      -Y 
k+1    k+1    k+1    k+1 


-Y. 


k+1 


+  (l-Y^-Y2-...-Yj^)  Y-^^ 


Expanding  the  determinant  in  the  first  term  about  the 
first  column,  the  determinant  in  the  second  term  about 
the  second  column ,  and  so  forth , 


J  =  Y  [-  (-Y    )  ]  (Y    )^  "^-Y  r  (-Y    )  1  (Y    )^~'^ 

+Y3[-(-Y^^^)](Y^^^)^-^±...-Y^[(-Y^^^)](Y^^^)^-l 

"^^-V^2----\^^k.l 
=  ^k.l  (2.4) 


for  k  even.   Similarly,  if  k  is  odd,  J  =  (Y,  , )  .   Then 

*'^ Vi'  '  'Vk.i'"'''---<Wi'"'''' 

^k+l"^ 

[Y    (1-Y  -. . .-Y  )] 
^  k+1^    1      ^k^^ 

(Y_.)^  e  ^+V  n   r(a.) 

K+X  j^^-L      1 

i=l   i    J^+1 

a,  ,^-1  -Y,  ,,  k+1 
(1-Y  -...-Y  )  ^+1  e  ^""V  n  r(a.). 
Ik  i=l    1 

(2.5) 


Thus, 


k   a,-l    k    a.  ,  -1 
k+1 


f  (Y  ,...,Y,  )  =   n   Y.^   (1-  Z  Y.) 
^  ^     i=l   ^       i=l  1 


~   k+1 

E  a.-l 
i=l  ^     -Y    k+1 
\+l      -      ^"V.n_^r(a.)  dY^^^, 

0 


k+1 
r(  Z  a.)  k   a.-l    k   a,  ,,-1 

i=l  ^   n  Y.    (1-  EY.)  ^  -^    0<Y.<1. 
k+1      1=1  ^       1=1^  1 

n  r (a.) 

i=l    ^  (2.6) 


From  an  inspection  of  the  Dirichlet  distribution, 

it  is  easily  seen  that  this  distribution  has  some  of 

the  properties  desired  in  a  distribution  for  a  vector 

of  continuous  proportions.   First,  it  is  a  distribution 

on  k  continuous  random  variables.   Second,  each  of  these 

random  variables  is  defined  on  the  interval  (0,1),  so  that 

the  Y  's  are  in  the  form  of  proportions.   Third,  if  the 

Dirichlet  density  is  written  in  a  slightly  different 

form,  namely,  letting  P^  =  Y^  for  i=l,...,k  and 

P,  ^T  =  (1-Y  -...-Y  ),  the  density  is 
k+1       Ik 

k+1 

r(  Z  a.)  k+1  a.-l 

f(P  ,P  ,...,P    )  =    i=l  ^  n  P /■   .           (2.7) 

1  Z               k+1    ]^+-L  i^i  1 

n  r(a.) 

i=l   ^ 


Now,  the  P. 's  have  the  property  that  they  sum  to  one. 
Thus,  the  Dirichlet  distribution  seems  to  be  a  good 
candidate  for  a  distribution  for  a  vector  of  continuous 
proportions. 

A  second  distribution  for  a  vector  of  continuous 
proportions  is  a  generalization  of  the  Dirichlet  distri- 
bution that  was  proposed  by  Connor  and  Mosimann  (1969). 
This  generalization  is  based  on  a  "neutrality"  concept 
defined  by  Connor  and  Mosimann.   Consider  a  set 
(Pj,...,Pj,)  of  nonnegative  continuous  random  variables 

satisfying  the  constraint  that  the  P.'s  sum  to  one. 

1 


Let 


S   =   Z  P    j=l,...,k   with  S.  =  0,  (2.8) 

D    i=l  1  0 


P 
^i-1 


Zj^  =    i    i=2,...,k-l   with  Z^  =  P-^,\  =   1,    (2.9) 


P=  (Pi,...,Pj^)   with  F_.^   =    {Pj^,...,V.) 

and   Pj2  =  (Pj+i/-.wPk)   J<k,   (2.10) 
and  Wj  =  [l/(l-Sj)]Pj2.  (2.11) 

The  concept  of  a  neutral  proportion  is  then  defined 
as  follows: 

Definition  2.1;  Given  a  random  vector  of  proportions, 
(P-j^, . .  .  ,Pj^)  ,  the  proportion  P^^  is  said  to  be 


neutral  if  P   is  independent  of  the  vector 

Intuitively,  this  definition  states  that  "P   does  not 

influence  (i.e.  is  neutral  to)  the  manner  in  which 

the  remaining  proportions  P2,...,Pj^  proportionally 

divide  the  remainder  of  the  unit  interval;  namely  the 

interval  (P^,l)."   The  concept  of  neutrality  is  also 

defined  for  vectors. 

Definition  2.2;  Given  P  divided  so  that  P  =  (P   ,P   )  , 
—        -  -    -jl  -j2  ' 

Pj-]^  is  a  neutral  vector  if  it  is  independent 
of  Wj .   If  p. ^  is  neutral  for  all  j,  then  P 
is  said  to  be  completely  neutral. 
An  important  point  to  note  here  is  that  the  order  of  the 
P^'s  in  the  vector  P  is  critical.   While  P  may  be  neutral 
in  the  vector  (P^,P2  , .  .  .  ,Pj^)  ,  P^   need  not  be  neutral  in 

^^2'^1'^3' •  •  • '-^k^  '  ^   similar  relationship  holds  for 
neutral  vectors. 

Connor  and  Mosimann  next  state  and  outline  the 
proofs  of  three  theorems  relating  these  concepts  of 

neutrality  to  the  random  variables,  Z.,  i=l, ,k 

defined  previously. 

Theorem  2.1;  If  p   is  neutral  for  j=l,2,...,r 
then  the  random  variables  Z,,Z2,...,Z   are 
mutually  independent. 
For  the  natural  extension  of  this  theorem  to  a  completely 
neutral  vector  a  stronger  result  may  be  obtained. 
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Theorem  2.2;  P  is  completely  neutral  if  and  only  if 
Z'^fZ^, . .  .  ,Z^   are  mutually  independent. 
Finally,  from  this  theorem  a  third  result  is  obtained. 

Theorem  2.3;  P  is  completely  neutral  if  and  only  if 
Pj-j^  is  independent  of  Z.^-j^  for  all  j. 
With  the  concept  of  neutrality  thus  defined  and  related 
to  the  random  variables  Z^,  it  is  possible  to  derive 
a  generalization  of  the  Dirichlet  distribution.   Assume 
that  P  is  completely  neutral.   This  implies  that  the 
Z^'s  are  mutually  independent.   Assume  the  density  of 
Z£  is  the  univariate  beta  density, 

-1   a^-l       b.-l 
f(Zi)  =  [B(ai,b^)]    Z.     (1-Z.)  "■         ,  (2.12) 

where  a^^>0   and  h^>0   and  B(aj_,bj^)  =  r  (a^)  r  (b^)/r  (a^+b^) 
is  the  beta  function.   Then 

k-1        -1  a.-l       b.-l 
f(Z  ,...,Z  )  =   nB{a.,b.)   Z.     (l-Z.)  ^   .      (2.13) 

Now,  the  Zj^'s  can  be  transformed  to  the  P-'s  since 
Zj_  =  P^/(l-P^+.  ..+P^_^)  .   The  Jacobian  of  the  trans- 
formation is  lower  triangular  of  the  form 
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J  = 


0 

0 

0 

0 

1 

1-p, 

0 

1 

0 

0 

0 

0 

1-p^- 

■^2 

1 

0 

1- 

-p    _p    _p 
1      2    ^3 

k-2 

1-    E    P. 
j  =  l    ^ 


k-1        1 


n     i-s 

m=l  n»-l 


(2.14) 


Thus, 


k-1 


f(p,,. 


l-^l"-"- ._     ..    ^    .^2~-'- 


.,P    )    =      n      B(a      b    )       P    -^       (P_/l-P.) 
^  i=l  111  21 


a   -1  b,-l 

(Vl/1-Pl-----V2)  (l-^l)    ' 


P        l^o-l  p  b,-l 

(1-         2    )    2       ...(i_   Pk_i  )    k 


1-P. 


1-P    -. .    -P 
"■      1    •••    ^k-2 


k-1 

1/    n     (1-s    J 

m=l  ni-1 


k-1 


-l.^i-^„   „       .v^ 


n      B(a    ,b    )    -^P^      /(1-s.       ) 
i=l  11  1  1-1 


k-1 


b,-lk-l 


n   [1-S./ (1-S.   ,)]   ^      n     i/i-s    ,. 

i=l  "■  ^"-^  in=l  "^-^ 
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k-1  ,  b,  ,-lk-l  a.-l 

f(P  ....,P  )  =  n  B(a.,b.)  "-p/^"-^   n  p/ 
1      k    i^i    1   1    k     i=i  1 


k-2      b.-l  k-1        a.-l        b. 

n  (i-s.)  -"  /  n  (i-s.  )  1  (i-s.  )  ^ 
i=i   ^     i=i   1-1       1-1 


k-1  b,_-,-l  k-1  a.-l   k    -a. 

n  B(a.  ,b.)~-^P,  ^   ^  n  P.^   (  E  p.)   ^ 

1=1  1=1       3=1  -^ 


k-2      b.-l  k-1        b.-l 

n  (1-s.)  "■  /  n  (i-s.  )  ^ 
i=i   1      i=i   1-1 


^-^      -1  Vi-i 

n  B(a.,b.)  ^  p/  J- 

i=l      1    1        k 


k-1  a.-l   k    b._,-(a.+b.) 

n  P.^   (  E  P.)  ^  -^    ^  ^    ,  (2.15) 

i=l  ^    j=i  3 


where 


k  k-1 

E  P.=  1,  P,  =  1-  I  p.,  and  b^  is  arbitrary. 

-   1  K        »  .  1          U 

1=1  1=1 


In  the  special  case  when  b.  ,  =  a.+b.  i=2,...,k-l,  this 
^  1-1    11 

generalization  of  the  Dirichlet  distribution  reduces 
to  the  Dirichlet  distribution  itself.   Thus,  it  is  seen 
that  the  distribution  just  derived  is  indeed  a  general- 
ization of  the  Dirichlet  distribution. 
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Some  interesting  properties  of  this  distribution 
in  relation  to  the  Dirichlet  distribution  are: 

1.  Since  the  Dirichlet  distribution  is  a  special 
case,  a  vector  of  proportions  following  a 
Dirichlet  distribution  is  completely  neutral. 

2.  From  the  symmetry  of  the  Dirichlet  distribution, 
if  P  has  a  Dirichlet  distribution  then  any 
permutation  of  P  is  necessarily  completely 
neutral.   But,  for  P  to  follow  the  generalized 
Dirichlet  distribution,  only  one  permutation  of 
P  need  be  completely  neutral. 

From  these  results,  it  is  quite  evident  why  the  general- 
ized Dirichlet  distribution  may  more  easily  fit  a  vector 
of  continuous  proportions  than  the  Dirichlet  distribution. 
If  there  exists  one  completely  neutral  vector  among 
the  permutations  of  P,  and  if  it  may  be  assumed  that  the 
Z . • s  have  univariate  beta  distributions ,  then  P  has  a 
generalized  Dirichlet  distribution.   Furthermore,  to 
rule  out  the  Dirichlet  distribution,  it  is  necessary 
to  find  only  one  permutation  of  P  which  is  not 
completely  neutral. 

The  final  section  of  Connor  and  Mosimann's  paper 
deals  with  two  examples,  one  of  which  will  be  considered 
briefly.   The  data  for  this  example  come  from  a  horny 
covering,  called  a  scute,  on  the  underside  of  a  particular 
variety  of  Mexican  turtle.   The  undershell  is  divided 
into  two  sections,  an  anterior  and  a  posterior  section. 
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The  anterior  section  is  covered  by  five  scutes,  one 
gular  scute  and  a  pair  of  humeral  and  pectoral  scutes. 
Measurements  are  taken  along  the  midline  of  the  length 
of  the  gular,  humeral,  and  pectoral  scutes.   If  these 
lengths  are  denoted  by  '^^,'^2'    ^^'^   ^3  ^-espectively , 
while  the  total  length  is  L,  then  the  proportion  of  the 
total  length  taken  by  each  scute  may  be  formed  by 
Pi  =  ^i/L  for  i=l,2,3.   These  types  of  data  are  used  by 
taxonomists  to  distinguish  between  different  populations 
of  turtles. 

Through  a  study  of  the  correlations  of  P,  with 

^2/-'-~-^l'  ^2  ^^^^  ^l/l~P2'  ^^^  ^3  ^^th  P1/I-P3.  it  was 
found  that  the  vector  P  =  (Pi,P2.P3)  is  completely 
neutral.   For  other  orderings  of  P,  it  was  found  that 
the  vector  is  not  neutral,  a  larger  pectoral  of  humeral 
scute  favoring  the  gular  scute  to  occupy  proportionately 
more  of  the  remaining  interval.   However,  there  is  a 
problem  which  Connor  and  Mosimann  themselves  point  out. 
"In  these  analyses  the  correlation  coefficient  is  used 
as  a  measure  of  dependence  or  nonneutrality .   Significant 
non-zero  correlations,  tested  by  Fisher's  Z  transformation, 
are  taken  as  evidence  of  nonneutrality.   On  the  other 
hand,  even  if  the  population  correlations  are  zero, 
neutrality  does  not  necessarily  follow  with  our  non- 
normal  data."   In  other,  words,  a  vector  which  is  claimed 
to  be  neutral,  need  not  be  neutral  at  all.   Thus,  some 
better  measure  of  neutrality  would  be  very  useful. 


15 


Finally,  since  P  =  (P3_fP2'^3^  ^^  ^  completely 

neutral  vector  and  no  other  ordering  of  P  yields  a 

completely  neutral  vector,  then  the  generalized 

Dirichlet  distribution  may  be  considered  for  P,  while 

the  Dirichlet  distribution  may  not  be  considered. 

If  it  is  assumed  that  Z^  and  Z^   have  beta  distributions, 

then  P  has  a  generalized  Dirichlet  distribution. 

Thus,  at  this  point,  it  seemed  reasonable  to 

consider  only  the  generalized  Dirichlet  distribution 

for  a  vector  of  continuous  proportions,  since  the 

Dirichlet  distribution  is  a  special  case  of  the 

generalized  Dirichlet  distribution.   However,  a  recent 

paper  by  James  (1972)  would  seem  to  restrict  the 

usefulness  of  the  generalized  Dirichlet  distribution. 

The  main  thrust  of  James's  paper  is  that  if  certain 

ratios  of  the  P^'s,  other  than  Z.,  are  assumed  to  have 

beta  distributions,  then  the  generalized  Dirichlet 

distribution  reduces  to  the  Dirichlet  distribution. 

As  an  example,  a  theorem  from  James's  paper  states. 

Theorem  2.4;  Let  (P^,...,?^^)  have  the  generalized 

Dirichlet  density  and  suppose  that  each  of 

the  variables  U^^  =  P^/1-  Z  p.  i=l,...,n-l 

IS  marginally  beta.   Then  b  =  a   +b 

i    i+1   i+1 

for  all  i  =  l,...,n-l  and  consequently 
(P^,...,P  )  has  the  Dirichlet  density. 
Thus,  the  rather  simple  assumption  that  U.  has  a  beta 
distribution  reduces  the  generalized  Dirichlet  distribution 
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to  the  Dirichlet  distribution.   There  is  apparently 
then  a  dilemma.   On  the  one  hand,  the  practical  example 
of  the  turtle  scutes  leads  to  the  conclusion  that  the 
generalized  Dirichlet  distribution  is  of  value,  while 
James's  theoretical  considerations  imply  that  it  has 
little  value. 

It  seems  then,  that  as  well  as  the  general  inference 
problems  of  estimation  and  testing,  there  are  the 
problems  of  a  measure  of  neutrality  for  a  neutral 
vector  and  of  determining  whether  the  Dirichlet 
distribution  or  the  generalized  Dirichlet  distribution 
better  fits  a  vector  of  continuous  proportions.   Although 
these  are  important  problems,  their  solution  will  be 
left  for  future  consideration. 

This,  then,  is  the  problem  and  the  work  which  has 
been  done  on  it,  as  set  forth  at  the  beginning  of  this 
dissertation.   It  became  evident  as  the  research  pro- 
gressed, that  the  problem  was  quite  broad.   Thus,  the 
results  which  follow  are  a  beginning  toward  utilization 
of  the  Dirichlet  distribution  for  analyzing  a  vector  of 
continuous  proportions.   Most  results  deal  not  with  the 
Dirichlet  distribution,  but  rather  with  the  beta  distri- 
bution.  Since  the  Dirichlet  and  generalized  Dirichlet 
distributions  are  so  closely  related  to  the  beta  distri- 
bution, and  since  work  on  estimation  of  and  testing 
hypotheses  about  the  parameters  of  the  beta  distribution 
has  been  rather  slight,  it  was  thought  to  be  a  good 
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starting  place  for  work  on  the  entire  problem.   As  is 
pointed  out  in  Chapters  3  and  4 ,  the  chapters  dealing 
with  inference  for  the  beta  distribution,  some  results 
are  directly  applicable  and  others  may  generalize 
rather  straightforwardly  to  the  Dirichlet  and  generalized 
Dirichlet  distributions.   Chapter  5  gives  the  results 
thus  far  obtained  for  the  Dirichlet  distribution. 
However,  some  questions  still  remain  unanswered  and 
will  provide  the  basis  for  further  research. 


CHAPTER  3 
ESTIMATION  FOR  THE  BETA  DISTRIBUTION 

Introduction 

If  the  Dirichlet  or  generalized  Dirichlet  distri- 
bution is  to  be  used  as  a  model  for  a  vector  of  contin- 
uous proportions,  it  will  be  necessary  to  have  a  means 
of  estimating  the  parameters  of  these  distributions. 
For  the  generalized  Dirichlet  distribution,  the  parameters 
may  be  estimated  through  the  Z . ' s  which  have  univariate 
beta  distributions.   This  might  also  be  done  for  the 
Dirichlet  distribution  since  it  is  a  special  case  of  the 
generalized  Dirichlet  distribution.   However,  the  parameters 
of  the  Dirichlet  distribution  may  also  be  estimated 
directly.   Since  the  Dirichlet  distribution  is  a  multi- 
variate beta  distribution,  it  is  quite  evident  that 
similar  problems  will  be  encountered  for  the  Dirichlet 
distribution  as  for  the  beta  distribution  in  any  method 
of  estimation.   For  example,  in  maximum  likelihood 
estimation,  the  likelihood  equations  for  the  beta 
distribution  are 


E  In(Xi) 
'i'(S)  -  ^(a+B)  =  i=l    ^  (3.1) 
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and 

n 
^  ^     E  ln(l-X.) 

^(6)  -  ^{a+S)    =   i=l .,  (3.2) 

n 

where  ^    is  the  digamma  function. 

For  the  Dirichlet  distribution,  the  equations  are  similar. 


k+1^       I    ln(P.  .) 
"Via.)    -  ^(  E  a.)  =  i=l     -*      j=l ,  . . .  ,k+l.  (3.  3) 
-•      j=l  ^        n 


Thus,  the  solution  of  these  equations  is  an  extension  of 
the  solution  of  the  equations  for  the  beta  distribution, 
and  will  be  given  in  Chapter  5.   For  this  reason,  a  study 
was  made  of  possible  estimators  for  the  parameters  of 
the  beta  distribution,  and  the  properties  of  those 
estimators.   Specifically,  exact  formulas  or  approximations, 
where  exact  formulas  were  not  obtainable,  were  found  for 
the  biases,  variances,  and  covariance  of  the  estimators. 
In  addition  to  estimating  the  parameters ,  estimators  for 
the  mean  of  the  beta  distribution  have  also  been  considered. 
This  was  done  because  such  an  estimator  would  have  more 
practical  significance  to  an  experimenter  than  estimators 
of  the  parameters  themselves. 

Sample.  Moment  Estimators 

The  first  estimators  considered  were  the  sample 
moment  estimators.   Let  M^^  denote  the  first  sample  moment. 
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"  n   2 

T,   X^/n,  and  M2  denote  the  second  sample  moment,   Z  X-/n. 
i=l  i=l 

Using  the  method  of  moments  technique  of  estimation, 

these  sample  moments  are  equated  to  the  corresponding 

population  moments  in  terms  of  the  parameters  of  the 

distribution.   The  equations  obtained  are  then  solved 

for  the  estimators  of  the  parameters. 


g  a(a+l) 

^1  =  .  .       and       M  =   ^  ^   ^  ^      .      (3.4) 
a+3  (a+3)  (a+6+1) 


From  the  first  equation, 
a(l-M  ) 


^1      .  (3.5) 


Substituting  into  the  second  equation, 

M^(a+1) 
M2  =  (a+d(l-M^)+l)   ;  (3.6) 


^1 


Thus, 


/s  2     2 

ctM^  -  aM   =  M   -  MM   .  (3.7) 


M-  (M  -M^)  (l-M  )  (M  -M  ) 

1   1  „2  ^        112 

a  =        2  and     B  =          2      .   (3.8) 

M^-M,  M  -M           ^         ' 

2   1  2   1 
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These,  then, are  the  moment  estimators  of  the  parameters 
of  the  beta  distribution.   The  moment  estimator  of  the 
mean  of  the  beta  distribution  is  simply  the  first 
sample  moment,  M,. 

Since  the  estimators  of  a  and  3  are  ratios  of 
functions  of  the  sample  moments,  their  expected  values  , 
variances,  and  covariance  are  not  easily  obtainable. 
Therefore,  first  order  approximations  for  the  biases, 
variances,  and  covariance  of  the  estimators  were  found. 
Let 


g(^l'^2^  =    M2-m2 


(3.9) 


and 


h(M^,M2)  = 


(1-M^)  (M^-M^) 


(3.10) 


Then,  by  expanding  these  functions  in  a  Taylor  series 
and  approximating  them  by  the  first  order  terms  of  the 
series 


a  =  g(M^,M2)  =  g  (E  (M^^)  ,E  (M2)  ) 


5g(M^,M2) 
+  (M  -E(M  ))     M^ 


6g(M^,M2) 


+  (M2-E(M2))     JmJ" 


E(M^)  ,E(M2) 


E(M^),E(M2)  (3.11) 
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and 


3  =  h(M^,M2)  ^    h(E(M^)  ,E(M2)) 


6h(M^,M2) 


+  (M^-E(M^))     6M-,_ 


6h(M^,M2) 


+  (M2-E(M2))     W^ 


E{M^)  ,E(M2) 


E(M^)  ,E(M2).  (3.12) 


Thus,  first  order  approximations  to  the  biases,  variances, 
and  covariance  of  the  estimators  of  a  and  g  may  be  found. 


6g(M^,M2) 


E(a-a)  -   E((M^-E(M^)  ))    M^ 


E(M3_)  ,E{M2) 


6g(M^,M2) 


+  E((M2-E(M2)  ))    6M2 


E(M^),E(M2).    (3.13) 


6h(M^,M2) 


E(6-B)  =  E((M^-E(M^)))    6M^ 


E(M^),E(M2) 


6h(M^,M2) 


+  E((M2-E(M2)))    5m^ 


E(M^),E(M2).    (3.14) 


But  the  partial  derivatives  of  g(M^,M2)  evaluated  at  E(M^) 
and  E(M2)  are  constant  when  the  expected  value  is  taken, 
and  thus  these  terms  drop  out  of  E(a-a).   Therefore, to  a 
first  order  approximation  the  estimators  of  the  parameters 
are  unbiased. 

First  order  approximations  to  the  variances  and 
covariance  of  the  estimators  are  found  similarly. 
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Var  (a)    -   Var (M   ) 


8g{M^,M2) 


6M. 


E(M^)  ,E(M2) 


+  Var(M2) 


6g(M^,M2) 


6M, 


E(M^)  ,E(M2) 


+    2    Cov(M^,M2) 


&g{M^,H^) 


6M, 


Sg{M^,l^^) 


'W, 


E(M^)  ,E(M2)  , 


(3.15) 


Var  (6)    -   Var(M   ) 


6h(M^,M2) 


6M^ 


E(M^)  ,E(M2) 


+   Var(M2) 


6h(M^,M2) 


6M, 


E(M^),E(M2) 


+    2    Cov(M^,M2) 


6h(M^,M2) 


6M, 


6h(M^,M2) 


6M, 


E(M^)  ,E(M2)  , 
(3.16) 


and 
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Gov  (a,  6)  -  Var(M-i^) 


6g(M^,M2) 


6Mn 


6h(Mj^,M2) 


6M, 


E(M3^),E(M2) 


+  Var(M2) 


6g(M^,M2) 


6M, 


6h(M^,M2) 


6M, 


E(M^),E(M2) 


+  Cov(Mj^,M2) 


6g(M-^,M2) 


6M, 


6h(M;L,M2) 


6M, 


'6g{M^,M2) 


6m, 


5h(M^,M2) 


6M, 


E(M^)  ,E(M2) 


.  (3.17) 


Expressions  for  Var(M^),  Var(M2),  Cov(M  ,M  ),  and  the 
partial  derivatives  evaluated  at  E(M-j^),  E(M2)  were  then 
found  in  terms  of  a  and  B. 


Var(M  )  =  E 


n    2     n     2 


Z  X. 
i=l  ^ 


E  i=l 


a  (a+1) 


_  (a+6) (a+3+1)    (a+6) 


t 


Var(M2)  =  E 


Z  X 
i=l 


2s 


^   2, 

Z  X. 

i=l  ^ 


C3.181. 


=  1 
n 


a(a+l)  (a+2)  (a+3) 


2      2 

a^  (a+1)^ 


Ja+B)  (a+B+1)  (a+B+2)  (a+B+3)    (a+B)  ^  (a+B+1)^. 

1(3. 19) 
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Cov(M^,M2) 


=  E 


E  X.  f  E  Xt 
i=l     i=l 


,-E_ 

n 

1=1 
n 

L 

E 

n 

■ 

a  (a+1)  (a+2) 


a  (a+1) 


(a+6) (a+6+1) (a+3+2)    (a+6)  (a+3+1) 


(3.20) 


6g(M^,M2) 


6M, 


^M^  (2M^-M2-Mp 


^^\)'^^V  (M2-m2)2 


E(Mi),E(M2)      (3.21) 


6g(M3^,M2) 


6M, 


^  M^(M^-l) 


^(^l)'^(^2^    (M^-M^ 


E(M^)  ,E(M2) 


(3.22) 


6h(M^,M2) 


6M, 


_M  (1+M  -4M  ) +M^ (1+M  ) 


E(M^)  ,E(M2) 


2  2 
(M2-Mp^ 


E(M^)  ,E(M2) 


and 


(3.23) 


6h(M  ,M  ) 
1   2 


6M, 


M  (2M  -M^-l) 
=   111 


E(M^),E(M2)     (^^_^2j2 


E(M^)  ,E(M2) 


(3.24) 


Since  E(M^)  =  a/a+3  and  E(M2)  =  a (a+1) / (a+B)  (a+6+1) , 
these  partial  derivatives  are  then  expressed  in  terms 
of  a  and  B.   At  this  point  a  computer  program  was  used 
to  evaluate  Var(a),  Var(B)/  and  Cov(a,B)  for  various 
values  of  a  and  B.   The  results  of  those  computations 
are  shown  in  Table  I. 
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The  estimator  of  the  mean  of  the  beta  distribution, 
M^,    is  an  unbiased  estimator,  and  an  exact  formula  for 
its  variance  exists. 


Var(M^)  =  E 


n 

7  z  x/ 

Ji=l   ^ 

2             r 

n 

-V 

L 

E  X. 
Eli=l  ^ 


aB 


(a+B)  (a+B+1) 


(3.25) 


The  variance  of  the  estimator  of  the  mean  for  various 
combinations  of  a  and  B  is  also  contained  in  Table  1. 

Maximum  Likelihood  Estimators 

Derivation  of  the  Estimators 

A  second  natural  set  of  estimators  to  consider  are 
the  maximum  likelihood  estimators.   Unfortunately,  the 
likelihood  equations  for  the  beta  distribution  do  not 
yield  simple  solutions  for  the  estimators.   The  problem 
of  obtaining  the  maximum  likelihood  estimators  for  the 
beta  distribution  has  been  investigated  in  a  paper  by 
Gnanadesikan,  Pinkham,  and  Hughes  (1967).   The  main  thrust 
of  Gnanadesikan,  Pinkham,  and  Hughes's  paper  is  to 
investigate  the  effect  of  using  all  or  some  of  the  order 
statistics  from  a  beta  distribution  to  obtain  maximxjin 
likelihood  estimators  for  the  parameters.   Given  a  sample 
X^,...,X   from  a  beta  distribution  with  parameters 


a  and  3,  so  that 
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f(x.)  =  r(a+g)   x*^  ■''(1-x.)^  ■'■ 
r(a)r(e)   1     1 


0<X.<1 

1 


(3.26) 


Then 


L(X^, 


.,X^) 


r  (g+B) 

r(a)r(6) 


^nx^-^i-x.)^-^ 
i=i  ^ 


(3.27) 


and 


In  L(X  ,  ...,X  )  =  n  In  r(a+6)  -  n  In  T  (a)  -  n  lnr(6) 
1      n 


n  n 

+  (a-1)  E  In  X.  +  (6-1)  S  ln(l-X.). 
i=l    ^        i=l      ^ 

(3.28) 


Note  that  the  notation  has  been  changed  from  Gnanadesikan, 
Pinkham,  and  Hughes's  paper  to  conform  with  the  notation 
used  in  this  dissertation.   The  likelihood  equations 
using  this  notation  are  then 


n  51n  r(a+g)  -  n  51n  T  (a)  +   Z  In  X.  =  0, 


(3.29) 


6a 


6a 


i=l 


and 


n  61n  r(a+B)  -  n  51n  T  {$)    +     Z  ln(l-X.)  =  0, 


(3.30) 


6B 


63 


i=l 
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Equivalently 


n 

S  In  X. 

'f'(a)  -  ^(a+6)  =  i=l      , 
n 


and 


where 


(3.31) 


n 

Z  ln(l-X.) 

^(e)  -  ^(a+B)  =  i^l ^  ,  (3.32) 


y(a)  =  61n  r(a),   ^(6)  =  61n  F  (B)  ,   and 
6a  63 

'I'Ca+B)  =  61n  T  (a+h      =    61n  F  (g+B)  .  (3.33) 

Sa  63 


By  the  nature  of  these  functions  of  a  and  B,  called  "psi' 
functions,  involved  in  the  likelihood  equations,  it  was 
not  possible  to  solve  directly  for  a  and  B.   Thus,  the 
Newton-Raphson  iteration  method  was  used  to  solve  the 
equations.   The  problem  is  further  complicated  for 
Gnanadesikan,  Pinkham,  and  Hughes  since  they  wish  to 
find  maximum  likelihood  estimators  also  for  the  case 
when  only  the  first  p  order  statistics  from  a  sample  of 
size  n  are  used.   The  likelihood  equations  are  then  of 
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the  forin 


and 


n 

Z  In  X 


E  i=l    ^  =  'i'(a)  -  ^{a+h    -    (l-£)  Ii(Xp;cx.3)      (3.34) 


n    n 


"   I(Xp;a,B) 


E  ln(l-X.)  -  - 

P  i=l =  'i'ih    -   "iia+h    -    (l-£)  2''^p'^'^K     (3.35) 


n      n 


"^   I(Xp;a,B) 


where 


(1 


I(x;a,3)  = 


t'^'^d-t)^"^  dt. 


I^(x;a,6)  = 


t°'"^(l-t)^"^ln(t)  dt, 


,1 


l2(x;a,6)  = 


t"  ■'■(l-t)^"-'-ln(l-t)  dt 


(3.36) 


Gnanadesikan,  Pinkham,  and  Hughes  then  go  on  to 
compare  the  estimators  obtained  by  using  various  fractions 
of  the  data  and  present  two  examples  of  their  results. 
However,  the  section  of  their  paper  which  is  relevant 
here  is  the  section  in  which  they  actually  compute  the 
estimators  given  the  entire  sample.   The  method  proceeds 


as  follows.   Let 
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n 

E  In  X. 


ki  =  i=l 


and 


I    ln(l-X. ) 


k2  =  i^ 


(3.37) 


Then  the  likelihood  equations  are 


'F(a)  -  ^^(a+B)  =  k^   and  ¥(3)  -  ^(a+B)  =  k  ,  (3.38) 


where  k   and  k   are  constant  in  terms  of  a  and  3.   Let 
the  solutions  of  these  equations,  a  and  B,  be  equal  to 
some  initial  values  plus  a  correction.   That  is,  a  =  o-Q+h 
and  B  =  3q+1.   Then 


Y(aQ+h)  -  T(aQ+h+BQ+l)  -   k^  =   0        and 


H'(Bq+1)  -  'l'(aQ+h+BQ+l)  -  k2  =  0. 


(3.39) 


Expanding  these  functions  in  Taylor  series'  and  using 
only  the  constant  and  first  derivative  terms  of  those 
series  as  an  approximation,  the  equations  are 


T(ao+h)  -  'i'(an+h+Bn+l)  -  k,  =  ^  (ars)    -   ^  {a^+Bn)    -   k 


C^PQ' 


+  h 


S['F(a)  -  H'(a+B)  -  ^1] 
6a 


°'0'^0- 


+  1 


6[1'(a)  -  y(a+B)  -  ^1] 
6B 


"O'^O- 
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and 


>i'(6o+l)  -  t(aQ+h+3o+l)  -  ^2  -  ^(Bq)    -   ^(ao+Bg)  "  ^2 


+   h 


6a 


an/ 


O'^O-i 


+  1 


63 


"O'^O- 


(3.40) 


Let 


D  = 


6['^(a)-'y  (a+6)-  1] 


6S 

6['F(B)-'F(a+B)-^2] 
6a 


6ri'(a)-f  (a+6)-^l] 


"O'^O       6B 

6['l'(6)-'i'(a+6)-^2] 


OiQ,6Q 
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aQ,3o 


otQ/Bo 


=  'I"  (cXq)^'  (3q)  -  ^'  (aQ+Bg)  [4"  (ag)  +  >i"(6o)3  •    (3.41) 

Now,  let  the  first  correction  to  the  initial  estimate, 
aQ,  be  h-^,    and  the  first  correction  to  Bq  be  1^^.   Then 


hi  = 


H'(aQ+BQ)  +  ^i   -   ^(Oq) 


-^'(V^o) 


1'(aQ+3Q)  +  ^2   ~   '^^^0^    ^'(6q)  -  "{"(Oq+Bq) 


h^  =  [l-CaQ+BQ)  +  k^  -  ^(ag)]  IT' (6q)  -  H"(aQ+BQ)] 
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+H"  (Oq+Bq)  [^(aQ+BQ)  +  ^2  -  '{'(Bq)]  /  D 


(3.42) 


and 


1   = 

1 


H"(aQ)  -  ¥'  (aQ+Bg)    ¥(aQ+BQ)  +  k^  -  ^  {a^) 


-H"(aQ+Bo) 


T(aQ+BQ)  +  ^2  -  T(Bq) 


[T(aQ+Bo)  +  ^2  -  T(Bo)]  ['i"  (0^0^  "  ^'(Co+Bq)] 


+T'  (Oq+Bq)  [H'(aQ+BQ)  +  k^  -  'FCUq)]  /  D  .       (3.43) 


The  entire  process  is  now  repeated  using  otQ+h-,  and  Bq+1-j_ 
as  new  initial  estimators.   The  iteration  continues  until 
it  is  clear  that  the  process  is  converging  to  a  solution. 
As  the  initial  estimators,  a^   and  Bq,  Gnanadesikan 
Pinkham,  and  Hughes  propose  the  sample  moment  estimators 
discussed  earlier.   These  seem  to  work  satisfactorily, 
in  that  the  process  does  converge  to  a  solution.   That  is 
the  Newton-Raphson  method.   The  only  difference  from 
Gnanadesikan,  Pinkham,  and  Hughes's  method  of  solving  the 
likelihood  equations  is  that  Bernoulli  series  approximations 
for  the  derivatives  of  the  psi  functions  have  been  used 
in  the  corrections  to  the  initial  estimators,  rather  than 
the  approximation  of  the  derivatives  by  differences. 
The  Bernoulli  series  approximations  were  given  in  a  paper 
by  Choi  and  Wette  (1969).   To  find  *!"  (X)  ,  the  equation  is 


If  X<8  then  use  is  made  of  the  recurrence  formula 
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H"  (X)    =    {l+{l+[l-{i.  -         2)/X^]/3X}/2X}/X        X>8.  (3.44) 


1"  (X)  =  H"  (X+l)  +  1/X^  .  (3.45) 


For  example. 


r,4.5)  =  r(8.5)  .  ,^_3,2.  ^j^.  "i;^.  ^;^ 


By  using  this  approximation  to  the  derivatives  of  the  psi 
functions,  the  Newton-Raphson  method  converges  in  fewer 
iterations  than  Gnanadesikan,  Pinkham,  and  Hughes  report 
in  their  paper.   In  no  case  did  it  take  more  than  four 
iterations  to  arrive  at  a  solution. 

Since  the  maximum  likelihood  estimator  of  a  function 
of  a  parameter  or  parameters  is  simply  the  function  of 
the  estimators,  the  estimator  of  the  mean,  a/a+g,  is 

A   ■A   /\  ■A  -A 

a/a+3  where  a.   and  3  are  the  maximum  likelihood  estimators 
of  a  and  g. 

Asymptotic  Properties  of  the  Estimators 

Having  obtained  the  maximum  likelihood  estimators, 
the  next  logical  question  would  be  what  sort  of  statis- 
tical properties  do  these  estimators  have.   More  specif- 
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ically,  since  they  are  rather  difficult  estimators 
to  obtain,  are  they  a  significant  improvement  over  the 
sample  moment  estimators.   To  answer  these  questions, 
an  attempt  has  been  made  to  investigate  the  biases, 
variances,  and  covariance  of  the  estimators. 

Because  of  the  fact  that  the  estimators  were 
obtained  through  an  iteration  process  rather  than  being 
given  through  an  explicit  formula,  exact  expressions  for 
the  biases,  variances,  and  covariance  were  unobtainable. 
However,  two  methods  of  approximation  were  available. 
First,  since  these  are  maximum  likelihood  estimators, 
their  asymptotic  distribution  is  well  known  if  certain 
regularity  conditions  are  met.   From  the  asymptotic 
distribution  the  biases,  variances,  and  covariance  for 
large  n  are  known.   Second,  in  an  attempt  to  determine 
the  biases,  variances,  and  covariance  for  smaller  values 
of  n,  the  likelihood  equations  were  expanded  in  a  Taylor 
series.   Since  the  first  and  second  moments  of  the  right 
hand  side  of  the  equations  are  obtainable,  then  the 
expanded  equations  may  be  solved  simultaneously  for  the 
biases,  variances,  and  covariance  of  the  estimators  of 
a  and  B. 

First,  consider  the  asymptotic  distribution  of  the 
estimators.   This  distribution,  for  a  multidimensional 
parameter  vector,  is  given  in  Wilks  (1962) . 
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Theorem  3.1;  If  (X^,...,X^)  is  a  sample  from  the 

c.d.f.,  F(X;9q),  where  0q  is  r-dimensional  and 

F(X;e)  is  regular  with  respect  to  its  first  and 

second  B-derivatives  for  6  in  fl  q ,  and  if  the 

maximum  likelihood  estimator  {Q-,,...,Q) 

satisfying  (12.7.1)  is  unique  for  n>_some  nQ, 

n 
and  measurable  with  respect  to   II  F(X  ;e), 

i=l    ^ 
then  it  is  asymptotically  distributed  for  large 

n,  according  to  the  r-dimensional  normal 

distribution,  N({eQ},||nB   ||~),  where 


1 1^1 1  =  IIV"o'»o'll 


Note  that 


Bpq(e,e')  = 


Sp(X;e')Sq(X;e')  dF(X;e)  .   (3.46) 


To  apply  this  theorem,  the  conditions  of  the  theorem  must 
be  met  for  the  beta  distribution.   Regularity  with  respect 
to  the  first  and  second  e-derivatives  is  equivalent  to 


E(S  (X;6))  =  0     and 
P 

E(S  (X;6)S  (X;e)  +E(S   (X;e)))  =  0  .  (3.47) 


Now  in  this  case 


S  (X;e)  =  51n  f(X,a,g)  =  ^(a+6)  -  4*  (a)  +  In  X, 
^  .   6a 

S  (X;e)  =  51n  f(X,a,B)  =  ^(a+6)  -  H'(6)  +  In(l-X), 
^  66 


and 
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S__(X;e)  =  6^1n  f(X,a,B)  =  4"  (a+B) 


(3.48) 


Thus, 


E[Sp(X;e)]  =  ^(a+B)  -  4'(a)  +  E  (In  X) 


(3.49) 


To  obtain  E(ln  X)  consider 
fl 

xo'-id-x)^-!  dX  =  r(a)r(B)  , 
Q  r(a+B) 

,1 

6_   x"~  (l-X)^""""  dX  =  6_  r  (a)r(B)  , 
6a  I-  6a   r(a+B) 


(3.50) 


(3.51) 


(In  X)  X^  ■'■(l-X)^"-'-dX  =  r(B)_[ri(a)-r(a)H'(a+B)]  ,  and 


r(a+B) 


(3.52) 


E(ln  X)  = 


r(a+B) 

r(a)r(B) 


r(p)  [r'  (a)-r(a)y(a+B)] 
r(a+B) 


=  H'(a)  -  'i'(a+B)  • 


(3.53) 


Therefore, 


E[S  (X;e)]  =  'F(a+6)  -  H*  (a)  +  [H' (a)  -  I'(a+B)] 


=  0  . 


(3.54) 
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Similarly,  E[S  (X;e)]  =  0. 


pq 


E[S^^(X;e)]  =  e|  _S_   In  f  (X,a,6) 

6a56 


=  -E 


6_   In  f  (X,a,3)  6_  In  f  (X,a,3)' 
6a  63 


(3.55) 


and 


EIS  (X;e)S  (X;0)] 

c  SI 


=  E 


6_  In  f (X,a,6)  6_  In  f (X 
fSa  63 


fa,3n  . 
(3.56) 


Thus, 


E[S  (X;e)S  (X;e)  +  E{S   (X;e)}]  =  0, 

P     q        pq 


(3.57) 


Therefore,  F(X,eQ)  is  regular  with  respect  to  its  first 
and  second  e-derivatives.   Since  the  iteration  process 
converges  to  a  single  pair  of  estimators  for  a  and  3, 

then  the  solutions  to  the  likelihood  equations  are  unique, 

„      .    -    -        ^  ^  ^  n 

Expressing  a  as  a  =  aQ+   Z  h^   and  3  as  3  =  3q+  Z  1^, 

i=l  i=l 

then  a  and  3  are  measurable  functions  if  Uq ,  3q,  h. , 

and  1^  are  measurable  functions  for  all  i,  since 

n  n 

lim  Z  h^   and  lim   Z  1-  are  measurable  if  all  the  h-'s 
n-f-oo  i=l       n^-"  i=l  ^ 

and  1^'s  are  measurable.   Let,  M^  and  M  be  the  first 

two  sample  moments.   Then  M^  and  M  are  measurable 

since  the  X^' s   are  random  variables  and  hence  measurable. 
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But,  aQ  =  M^(M-]_-M2)/M2-mJ  and  6q  =  (1-M^)  (M^-M2)/M2-M^. 
Since  products  and  sums  of  measurable  functions  are 
measurable  and  since  a  continuous  function  of  a  measurable 
function  is  measurable,  then  a^   and  Bq   are  measurable 
functions.   Consider  h-.  , 

h^  =  [H'(aQ+BQ)  +  \   -   ^(ag)]  [H"  (Bq)  "  1"(aQ+6Q)] 

+  ^'  (aQ+Bg)  [^(uq+Bq)  +   ^i    -   4'{Bo)]  /  D  ,      (3.58) 


where 


D   =   ^'   (ao)^'  (Bq)  -  'I"  (ao  +  3o)  ['!"  (ccq)  +  "i"   (Bq)]  •    (3.59) 

Since  ^  {X)    and  *f"  (X)  for  0<X<»  are  both  continuous 

functions  and  k-,  and  k~  are  measurable  functions ,  then 

all  components  of  h,  are  measurable.   Thus,  h,  is 

measurable  and  similarly  1,  is  measurable.   Therefore, 

after  one  iteration  the  estimators  a,  =  aQ+h, , 

B-,  =  Bq+1t  are  measurable  functions.   But,  h2  and  I2 

are  computed  by  replacing  a^,  and  Br*  in  h,  and  l,  by 

a-i  and  Bi-   Thus,  h2  and  I2  are  measurable.   Similarly, 

h.  and  1.  are  measurable  for  all  i.   Therefore  a  and  B 
1      1 

are  measurable  functions .   Thus ,  the  conditions  for  the 
theorem  giving  the  asymptotic  distribution  of  the 
maximum  likelihood  estimators  are  satisfied. 
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Once  again,  the  theorem  states  that  the  asymptotic 
distribution  of  the  estimators  is  the  r-dimensional 
normal  distribution,  N({epQ},||n  BpgH"^).   Hence, 
asymptotically  the  estimators,  a  and  6,  are  unbiased. 
The  notation  which  Wilks  uses  for  the  asymptotic  variances 
and  covariance  of  the  estimators,  |  |n  B   ''"''" 


pq 


IS  more 


commonly  given  as  the  inverse  of  n  times  the  information 
matrix,  (nl)  .  To  find  the  information  matrix,  proceed 
as  follows. 


f(X,a,e)  =   r(a+6) 


r(a)r(3) 


X^"^(l-X)^"^ 


(3.60) 


In  f(X,a,6)  =  In  r(a+3)  -  In  r  (a)  -  In  r{3) 


+  (a-1)  In  X  +  (6-1)  In(l-X),   (3.61) 


61n  f(X,a,B)  =  "V  (a+B) 
6a 


-   "H  (a)    +  In  X  ,  and     (3.62) 


6  In  f(X,a,B)  =  >1"  (a+3)  -  4"  (a)  . 
6a^ 


(3.63) 


Therefore, 


;in  f(X,a,3)l 
6a     J 


=  -E 


5  In  f(X,a,3) 
2 


6a' 


=  T'  (a)  -  "¥'  (a+3)  . 


(3.64) 
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Similarly 


5  In  f(X,a,g)  =  H"  (a+6)  -  4"  (g)    and 
63^ 


(3.65) 


5^1n  f(X,a,g)  =  H"  (a+B)  . 

6a66 


(3.66) 


Thus, 


51n  f(X,a,g) 


-,  2 


^F 


=  4"  (B)  -  H"  (a+6)  and   (3.67) 


L[ 


51n  f(X,a,g) 
6a 


][ 


51n  f  (X,a,B) 
6B 


=  -^'  (a+B) . 

(3.68) 


Hence, 


I  = 


T'  (a)  -  ¥'  (a+B)       -y'  (a+B) 


-'!''  (a+B)       ^'  (B)  -  ^'  (a+B) 


(3.69) 


so  that 


I  = 


Y'  (a)  -  "H'  (a+B)       -'}"  (a+B) 


-¥'  (a+B)       H"  (B)  -  'i"  (a+B) 


=  ^'(a)'l"(B)  -  '1"  (a+B)  [W  (a)  +  f  (B)]  .        (3.70) 


Now 
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.-1 


y  (B)  -  y  (g+B) 


1"  (a+B) 


1"  (a+B) 
III 


y  (g)  -  T'  (g+B) 


(3.71) 


Therefore,  asymptotically, 


Var(g)  =  ^ '  (B)  -  1"  (a+B)   , 

STTl 


Var(B)  =  'F'  (g)  -  ^'  (g+g)   ,   and 

HTil 


Cov(g,B)  =  H"  (g+6) 


n  I 


(3.72) 


The  estimator  of  the  mean,g/g+B,  is  asymptotically  unbiased 
since  it  is  the  maximum  likelihood  estimator  of  g/a+B. 
To  find  the  variance  of  the  estimator,  note  that  the 
asymptotic  variance  of  a  function  of  the  maximum  likelihood 
estimators  is  simply  the  variance  of  the  Taylor  series 
expansion  of  the  function  terminated  after  the  first 
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derivatives.   Thus,  asymptotically, 


Var 


g 


ig   l°''2J 


6g/g+3     | "  Var (g) 
^     6( 


gg/g+B 
66   l^'B 


-.2 


Var(B) 


+  2 


Sg/g+B 


5g/g+B 
63 


■|ct,Bj_ 


Cov(g,B) 


B^Var(g)  +  g^Var(B)  -  2gBCov(g,B)  ,   (3.72) 
(a+B)^ 


where  Var(g),  Var(B),  and  Cov(g,B)  are  the  asymptotic 
expressions  obtained  previously.   Substituting  those 
expressions. 


Var   g 


a+B  -' 


=  {  B  [y'  (B)  -  'i"  (g+B)] 


+  g  [4"  (g)  -  ¥•  (g+B)] 


-  2gB  >!"  (g+B)}/n|l|  (g+B)   .     (3.73) 


Small  Sample  Properties  of  the  Estimators 

At  this  point,  a  comparison  of  these  asymptotic 
results  with  the  results  for  sample  moment  estimation 
could  be  made.   However,  some  information,  even  if  only 
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an  approximation,  on  the  small  sample  biases,  variances, 
and  covariance  of  the  maximum  likelihood  estimators 
would  be  desirable.   Fortunately  this  can  be  done. 
Consider  again  the  likelihood  equations 

n 

^  ^      E  In  X. 
y (a)  -  T (a+6)  =  i=l    ^    and 


Z  ln(l-X. ) 
Y(B)  -  "H  (a+B)  =  i=l 


(3.74) 


Now 


n 

E  In  X, 

i=l ] 

n 


E(ln  X.) 


(3.75) 


since  the  X.'s  are  a  random  sample  from  a  beta  distribution 
1 

with  parameters  a  and  6.   As  has  been  shown  before, 
E(ln  X.)  =  ¥(a)  -  'l'(a+B).   Similarly 


E  ln(l-X.) 
i=l      ^ 


=  E[ln(l-X^)]  =  H'(B)  -  ^(a+3).    (3.76) 


Now  let 


K(a,3)  =  ¥(a)  -  ^(a+6)  -  [H*  (a)  -  ^(a+6)], 


L(a,6)  =  >i'(6)  -  H'(a+6)  -  [4'(6)  -  4'(a+B)], 
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Z  In  X. 
k  =  i=l    ^  -  [^(a)  -  H'(a+6)]  ,  and 


E  ln(l-X.  ) 
1  =  i=l      "-   -  [H'(6)  -  ^(a+6)], 
n 


(3.77) 


so  that  the  likelihood  equations  may  be  transformed 
into  the  following  equations, 

K(a,B)    =  k        and 


L(a,B)    =   1    . 


(3.80) 


Now  let 


m. .    =  E(  k^l^    )    , 
ID 


yj_j   =  E[    (S-a)^(3-6)-'    ]    / 


ID 


L.  .    = 
ID 


[6  1 

i 

[^  1 

L6aJ 

K(a,B) 


6 
6a-' 


§_ 
6B- 


i!j! 


^    L(a,g) 


and 


i!j! 


a,B 


Ot/6 


(3.81) 


By  expanding  K(a,B)  in  a  Taylor  series  about  the  point 
(a,B)  , 


K(a,B)  =  'l'(a)  -  '{'(a+B)  -  I'i'(a)  -  ^(a+B)] 
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K(a,B)  =  K(a,6)  +  (a-a)  6K(a,6) 


6a 


a.,B 


+    (3-3)  5K(a,B) 


66 


^  7       2^^ 

+    (g-g)   6  K(a;3) 


a, 3 


6a 


a,B 


+  (a-a)  (3-3)  6^K(a,3) 


6a63 


a, 3 


+  (3-B)^   6^K(a,3) 


6B 


+  . . 


a, 3 


(3.82) 


(a-a)  Kj^o  "^  (^"^^  ^01  "*"  ^°^~°^'       ^20 


+  (a-a)  (3-3)  K^^  +  (3-3)   Kq2  +  .•• 


(3.83) 


Similarly 


L(a,B)  =  (a-a)  L,  ^  +  (3-3)  L^.,  +  (a-a)   L 


ao 


01 


'20 


+  (a-a)  (3-3)  L^  +  (3*3)   Lq2  +  •• 


(3.84) 


Since  K(a/3)  =  k  and  L(a,3)  =  1  and  since  the  moments 
of  k  and  1  can  be  found ,  then  the  moments ,  or  more 
specifically  the  y^^^'s,  may  be  expressed  in  terms  of 
the  moments  of  k  and  1.   Specifically,  terminating  the 
Taylor  series  after  second  derivatives. 


0  =  E(k)  =  E(K(a,3))  ^  E{  (a-a)  K^^q  +  (3-3)  K 


^01 


+  (a-a)^  K20  +  (a-a)  (3-3)  K^^i  +  (3-3)^  Kq2} 
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0  =  E(l)  =  E(L(a,B))  ^    E{  (a-a)  L^q  +  (B-3)  Lq^ 

+  (a-a)   L^Q  +  (a-a)  (B-6)  L^^  +  (3-B)^  L^^}, 


m2Q  =  E(k  )  =  E[(K(a,3))^]  -  E{ [  (a-a)  K^q  +  (0-3)  Kq^ 
+  (a-a)^  K2Q  +  (a-a)  (3-3)  K^^  +  (3-3)^  ^02^^^' 


m^^  =  E(kl)  =  E[K(a,3)L(a,3)]  =  E{  [  (a-a)  K^^  +  (3-3)  K^^ 
+  (a-a)^  K^Q  +  (a-a)  (3-3)  K^^  +  (3-3)^  K^^] 
[(a-a)  L^Q  +  (3-3)  L^^^  +  (a-a)  L^^ 
+    (a-a) (3-3)  L^^  +    (3-3)^  L^^l)  /  and 


""02  "  ^^-^^^  "  E[(L(a,3))^]  -  E{[(S-a)  L^Q  +  (3-3)  L^^ 

+  (a-a)^  L20  +  (a-a)  (3-3)  L^^  +  (3-3)^  ^02"^^^' 

(3.85) 


Expressing  these  equations  in  one  matrix  equation  and 
including  only  the  terms  of  the  expansions  involving 
^10'  ^01'  ^20'  ^11'  ^^^  ^02'  ^^""^^  those  are  the  terms 


of  interest/ 
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'20 


11 


02 


^0   ■^01 


L     L 
10    01 


20 


20 


11 


11 


02 


02 


2  2 

°     ^10    ^^10^01    ^01 


^10^01 
°    ^loho   „  +,     ^01^01 

^oAo 


2  2 

10      10  01     01 


10 


01 


'20 


11 


'02 


.(3.86) 


In  matrix  notation,  m  =  Ku  .   Solving  for  the  vector  y_  , 

K  m  =  y_  ,  so  that  an  approximation  for  y_  in  terms  of  a 

matrix  of  constants  with  respect  to  the  distribution, 

F (X  ,...,X  ),  and  a  vector  m,  for  which  moments  can  be 
In  — 

found,  is  obtained.   Notice  that  the  ]£-vector  is  not 
exactly  E(a),  E{B),  Var(a),  Gov (a, 6),  and  Var(6),  but 
rather  E(a-a),  E(3-6),  E(a-a)^,  E(a-a)(6-6),  and  E(6-3)^. 
However,  from  these  expected  values  the  biases,  variances, 
and  covariance  of  the  estimators  may  be  found. 

The  next  question  is  how  good  an  approximation  is 
this.   The  order  of  the  approximation  in  powers  of  n 
depends  on  the  order  in  n  of  the  terms  in  m  and  K,  and 
the  order  of  succeeding  terms  in  the  equation.   In  other 
words,  if  the  Taylor  series  expansion  of  K(a,B)  were 
to  be  carried  out  further,  thus  incorporating  higher 
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moments  of  [  (a-a) ^, (6-6) ^] ,  higher  moments  of  (k^,l^) 

would  need  to  be  included  in  order  to  solve  the  system 

of  equations.   Higher  powers  on  n  would  then  be  included 

in  the  expansion,  implying  a  better  approximation. 

Since  K  is  constant  in  n,  the  order  of  the  approximation 

will  depend  only  on  m.   On  the  condition  that  m. .  is 

ID 
of  order  [(i+j+l)/2]  in  1/n,  where  [  ]  in  this  case 

represents  the  greatest  integer  function,  the  solution 

of  the  matrix  equation  previously  given  yields  p  to  order 

1/n.   Since  this  approximation  worked  fairly  well,  except 

for  very  small  values  of  n,  a  better  approximation  has 

not  been  computed. 

To  prove  the  condition  that  m. .  is  of  order 

[(i+j+l)/2],  consider  the  two  special  cases  of  mj^Q  for 

k  odd  and  for  k  even,  in  order  to  fix  the  ideas  of  the 

proof  for  the  more  general  case  of  m. .. 


n^ 


n 

I    (Y.    -    y) 
i=l 


(3.87) 


where  Y.  =  In  X.,  X  , . . . ,X   are  a  random  sample  from  the 
1       1    1      n 

beta  distribution  with  parameters  a  and  B,    and 

y  =  'l'(a)  -  H'(a+6)  =  E(ln  X.).   Let  k  be  even.   In  the 

n 
expansion  of  [  Z  (Y.-y)]^  there  will  be  a  term 
i=l 

I        (Y^  .-y)^(Y.  -y)^...  (Y.    -y)^.      (3.88) 
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Since  the  X^ '  s  and  hence  the  Y^^  •  s  are  independent  and 
identically  distributed, 


.  ,^  ,    ^^i  "^^  ^^i  -vj)^--.(Yi   -y)^ 

Liifi2=f=---+V2    ^       2         V2 

=  n(n-l) (n-2)... (n-|fl) [E (Y^-y) ^] ^^^         (3.89) 


k/2 
Which  is  of  order  n    .   All  terms  preceding  this  one 

^        k 
m  the  expansion  of  [  I    (Y^-\i)]      will  be  of  smaller 

k/2      ^""^ 
order  than  n  '^   since  they  will  include  a  summation 

over  less  than  k/2  subscripts.   The  next  term  following 
this  one  in  the  expansion  would  be 


(3.90) 


.  .  ^  -^  ,    ^^i  -^)^(^i  -y)^..(Y.  -y)^Y.   -y) 

±l4=X2+---+ik+l    1      ^2         ^k      ^k^^ 


The  expected  value  of  this  term  is  zero  since  the  Y. 's 
are  independent  and 


E(Y.    -y)  =  E(ln  X.    -  E[ln  X.    ])  =  0.         (3. 91] 

^1  Xl        %1 

J+-1  2^1         jfl 


Thus,  all  terms  with  fewer  components  than  (3.88)  will 
have  a  smaller  order  than  n/  and  all  terms  with  more 
components  than  (3.88)  .will  be  zero  in  expectation. 
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Thus,  the  highest  power  of  n  present  in  m,  .,  for  k  even,  is 


k, ,  k/2, 


(1/n^)  (n^'")  =  ~k72  =  [(k+l)/2] 
n      n 


(3.92) 


so  that  m^Q  is  of  order  [(k+l)/2]  in  1/n.   Now  let  k  be 

odd.   By  an  argument  similar  to  the  previous  one,  the 

term  with  highest  power  of  n,  which  shall  be  referred  to 

n        , 
as  the  "worst"  term,  in  the  expansion  of  E[  E  (Y.-y)]   is 

i=l   ^ 


(Y,  -y)^(Y,  -y)^..  (Y.    -y )  ^ 


2 


^k-1 


=  n{n-l)  (n-2)...  (n-lSzi-;-l)E 


(Y   -y)2{E[(Y.  -y)2]} 
^1         ^2 


k-3' 


(3.93) 


which  has  highest  power  of  n  equal  to  n       . 
Therefore,  ni  q,  for  k  odd,  has  highest  power  of  n 


(1/n^)  (n^^~^)/2)  ^   (k+i)/2  =   t{k+l)/2] 


(3.94) 


so  that  m^^Q  is  of  order  [(k+l)/2]  in  1/n.   Now,  consider 


m   in  general, 
pq 


"l 

m 

= 

E 

p 

pq 

n'- 

E  (Y.-y  ) 

i=i   ^  y 


n^ 


I     (Z.-y  ) 
i=l   ^  ^ 


=  „p+q  E 


r  (Y.-y  ) 

i=i    1   y 


I     (Z.-y  ) 
i=l   1   2 


(3.95) 


where 
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Yi  =  in  X.  , 


Z^  =  In(l-X^), 


Py  =  4'(a)  -  '{'(a+B) 


=  E(ln  X^) ,    and 


\i„  =   ^(B)  -  'i'(a+B)  =  E(ln(l-X.))  . 


(3.96) 


Assume  p<q.   A  similar  argument  will  cover  the  case  p>q. 
Let  p  and  g  both  be  even.   Then  there  are  two  possible 
"worst"  terms  in  the  expansion, 


2         2 

(Y.  -m)     (Y.  -y„)  ... 


i,+i2+...ip+Ji+J2+...+j  ■  "1  ^     '2  y 


(Y   -y„)^Z  -V^iZ      -y  )\..(Z   -y  )^ 


E 
2 


(3.97) 


which  has  highest  power  of  n  equal  to  2-  +  3.  and 


2 


(Y,  -y„)  (Z,  -y,)... 


ii  "y    In  '^z 


2  2 

(Y.  -y  )  (Z.  -y  )  (Z.  -y  )  ...  (Z.    -y  ) 
ip   y    Xp   2    31   z        3      z 


(3.98) 
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which  has  highest  power  of  n  equal  to  p  +  2ZE  =  E±a. 
Thus,  m   has  highest  power  of  n 

(1/nP^^)  (nP+'5/2)  ^  i/nP+q/2  ^  ^/^  [  (p+q/2)+l]       ^3^^^^ 

so  that  nipg  is  of  order  [(p+q+l)/2]  in  1/n  for  p  and 

q  even.   For  p  even  and  q  odd,  the  "worst"  terms  have 

powers  of  n  equal  to  E  +  2zi  =  Elazi  and  d  +  ^"P"!  =  P+^-l 

2     2       2       ^2        2 

so  that  nipg  is  of  order  E±£±l  =  [E±3±l]  in  l/n.   For  p 

odd  and  q  even, the  "worst"  terms  have  powers  of  n  equal 

to  E^  +  I  =  Ei|ll  and  p  +  aiEzi  =  ElSll,    so  that  m^^ 

is  again  of  order  E±2±l  =  [E±2±l]  in  1.   Finally,  for 

2         2       n 

p  and  q  both  odd,  the  "worst"  terms  have  powers  of  n 
equal  to  Ezi  +  Szi  =  P±|z2  and  p  +  2ZP  =  ElE  so  that 
mpg  is  of  order  Ejq  =  [£+£]  in  i.   This  completes  the 
proof. 

Now,  if  a  solution  for  the  vector  y  in  the  matrix 
equation  m  =  Ky  can  be  found,  then  an  approximation  of 

order  l/n  for  E(a-a),  E(3-6),  E(a-a)^,  E(a-a)(6-6),  and 

'^    2  . 

E(3-B)   is  obtained.   Solving  the  matrix  equation  involves 

finding  m2Q,  m^^,    and  mQ2  and  the  K-matrix  and  its 
inverse.   Expressions  for  m  and  K  have  been  found,  but 
the  computation  of  k"""-  and  then  of  y  is  done  by  a 
computer  program.   The  results  for  various  combinations 
of  a  and  6  are  given  in  Table  2.   The  K-matrix  is 
found  from  partial  derivatives  of  K(a,6)  and  L(a,B) 


•i  +-;  /N    ^ 

^ii    =    ^  K(a,B) 

5a    63      il    j! 


and 


a, 6 


Since      K(a,B)    =   ¥  (a)    -   4'(a+B)    -    [^(a)    -   H'(a+6)]    and 
L(a,6)    =  '^(3)    -   ^(a+6)    -    ['l'(6)    -   4'(a+3)] 


K^Q   =  ^'  (a)    -   ^'  (a+3) 


^01   "   ^'  ^^^    "   '^'  ^'^'^^^ 


^01  "  "'^'  ^""^^^ 


Lj^Q  =   -4"  (a+6) 


^20   =   'F"(a)    - jyCa+B) 


:-L3_  =   -'F'-Ca+B) 


Lq2   =   y"(3)    -   '^"(a+B) 


L;Li   =   -^"(a+B) 
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L^ .    =    5^-^^    L(a,B) 
5a    6B      i!    jl 


a,B 


(3.100) 


Kq2   =   -^^"(a+B) 


^20  =  ziii^dn.  • 


(3.101) 


As  before,  the  psi  primes  and  double  primes  are  calculated 
using  Bernoulli  series  expansions  of  those  functions. 
The  vector,  m,  is  easily  found  since 


20 


1   "  1   ^ 

±  ZlnX.  -E(£  ZlnX.) 

n  i=l    1      n  i^i    1 


=  Var 


±  Z  In  X. 
"  i=l    ^ 
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_   1 


20   =  ^  Vardn  X.) 


=   i  EI  (In   X.)^    -    [E(ln   X.)]^] 
n  1  1 


=   i  E[(ln   X.)       -    m'(a)    -    4'(cc+3)]^]    and       (3.102) 
n  1 


E(ln   X.)^    =  r  (g+g)       x"    •"■  (1-X  .  )  ^""^In   X?    dX  . 

1  I       n   /-■■^  71   >nx  11  11 


r(a)r(B)     ^ 


rjo+B) 
r(a)r(e) 


6  r(a)r(3) 


L  5C.2        r(a+S)     J 


r(a+B)       [r"(a)r(B)    -   r-  (a)r(3)T(a+B) 

r(a)r(3) 


-   r(a)r(B)'i"  (a+3)    -   r  •  (a)r  (6) V  (a+3) 


+   r(a)r(6)'l'(a+6)    ]    /  T  (a+3) 


=    (^'(a)    +   T(a)^)    -   'l'(a)¥(a+B) 


-   T'  (ot+3)    -   1'(a)H'(a+3)    +  H'(a+3) 


=   ^'  (a)    -   H"  (a+3)    +    [H'(a)    -   T(a+3)]       .     (3.103) 


Therefore, 


in..    =  i   I-i"  (a) 


-  ¥'  (a+3)] 


(3.104) 


similarly 
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and 


"'ll  =  ^ 


n  n 

1   S  In  X .  -  E (  1   E  In  X . ) 
n  i=l    ^      n  i=l    ^ 


,   n  n 

i   Z  ln(l-X.)  -  E(  i   E  ln(l-X.) 
n  i=i       1       n  .^^      x' 


=  Gov 


(  ±  E  In  X  ),(^  Z  ln(l-X.)) 
"  i=l    in  ^^^      1 


=  i  E  [In  X^  In(l-X^)  -  E (In  X^) E (In (1-X^) ) ] 


-  [-"¥'  (a+6)] 
n 


(3.105) 


02 


_  1 


=  £e  [(In(l-X^))^  -  [E(ln(l-Xj^))]^] 


=  -  ["1"  (6)  -  'i"  (a+B)]  . 
n 


(3.106) 


This  gives  an  approximation  of  order  1/n  for  E(a-a), 
E(6-3),  E(a-a)^,  E(a-a)(B-6),  E(B-6)^.   The  accuracy 
■  of  this  approximation  depends  on  the  magnitude  of  the 
coefficients  of  the  l/n^  terms,  the  1/n^  terms,  and  so 
forth  in  the  expansion.   An  idea  of  how  well  the  approxi- 
mation works  can  be  gathered  by  looking  at  Table  2 . 
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Notice  that  the  problem  is  symmetric  in  a  and  B ,   so  that 

it  is  only  necessary  to  find  the  biases,  variances,  and 

covariance  for  combinations  of  a  =  a  and  6  =  b.   The 

results  for  a  =  b  and  B  =  a  will  be  found  from  the 

symmetry.   Since  Var(a)  =  E(a-a)^  -  [E(a-a)]^  and  Var(a)>0, 

this  provides  a  check  on  the  accuracy  of  the  approximation 

for  a  given  sample  size  n.   For  example,  consider  the 

first  entry  in  Table  2 ,  for  a  =  6  =  • 1  •   Here 

E(a-a)  =  '20503  ^^^  E(S-a)2  =  -01515^   Evidently,  for 

n  =  1,  the  approximation  is  not  good  since  the  bias 

squared  is  then  larger  than  the  expected  mean  square, 

ECa-a)  .   However,  for  n  =  10,  E(a-a)^  =  .001515  and 

E(a-a)  =  .020503,  so  that  the  bias  squared  is  smaller 

than  the  expected  mean  square,  which  makes  the  variance 

of  a  positive.   Consider  now  the  entry  for  3  =  .1,  a  =  10. 

Here  E(S-a)  =  ^-5135  ^^d  E(S-a)^  =  ^013.568    p^r  n  =  10, 
n  n 

the  bias  squared  is  still  larger  than  the  expected 
mean  square.   A  sample  size  of  at  least  13  is  needed 
before  the  bias  squared  becomes  smaller  than  the  expected 
mean  square.   For  a  sample  of  size  20  the  bias  squared 
would  be  31.088  and  the  expected  mean  square  would  be 
50.6784,  so  that  the  Var(S)  would  be  19.5904.   From 
Table  2,  it  can  be  seen  that  for  larger  values  of  6  a 
sample  size  of  5  or  6  is  sufficient  so  that  the  bias 
squared  is  smaller  than,  the  expected  mean  square. 
For  example,  take  the  case  where  a  =  5  and  B  =  1.   Here 
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E(a-a)  =  iiiMOS  and  E(a-a)2  =  57.0243^   ^  3    ^^  ^^^^ 
n  n 

of  5  is  large  enough  so  that  the  expected  mean  square  is 
larger  than  the  bias  squared,  and  for  a  sample  of  size 
20,  [E(a-a)]^  =  .6529  and  E(a-a)^  =  2.8512.   Thus,  the 
Var(a)  =  2.1983.   For  most  parameter  values,  then,  one 
needs  a  sample  of  only  about  20  observations  to  have 
a  reasonable  approximation  to  the  biases,  variances, 
and  covariance  of  the  parameters. 

To  obtain  approximations  of  order  1/n  for  E(a/a+6) 
and  Var(a/a+3),  an  expansion  of  the  function  a/a+3 
in  a  Taylor  series  is  needed. 


g  =   g  +  (g-g)  6g/g+g 
a+3    g+3  <Sg 


a, 3 


+  (3-3)  6a/g+3 
63 


a, 3 


^    2   2^.^  ^ 

+  (g-g)   6  g/g+3 


6g 


a, 3 


+  (g-g)  (3-3)  5  g/g+3 
6g63 


g,6 


+  (3-3)   5  g/g+3 
2        ^2 
63 


g,3 


(3.107) 


Since  all  terms  of  the  expansion  not  included  in  the 
approximation  are  of  higher  order  than  1/n,  an 
approximation  to  E  (g/g+3)  need  only  include  these 
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six  terms.   Thus, 


E(  JL-  )  =  -^  +  E(S-a) 


a+B     °'+e         L(a+3)2J 


+  E(B-6) 


L(a+6)2J 


+  E(a-a) 


2  r 


+  E(3-B) 


+  E(a-a)  (3-3) 


and 


a-3  ] 
-(a+3)^-J 


(a+a)-"-! 


Var  (_5L_) 

a+3 


'6a/a-f3    1   Var  (a)  + 
L  6S   l^^'^J 


"6  a/ 


L  63 


a+3    "I 


Var  (3) 


+  2  r6a/S+B     1  n 

L  6a   l^'^JL' 


6a/a+3 
63   l"'^- 


Cov(a,3) 


!^ Var  (a)  +   a^   Var  (3) 


(a+3) 


(a+3) 


+  -2a3  Gov (a, 3)  . 
(a+3)^ 


(3.108) 


Again,  this  approximation  is  of  order  1/n  since  all  terms 
of  the  expansion  which  are  left  out  of  the  approximation 
are  of  higher  order  than  1/n  when  the  variance  of  the 
estimator  is  found.   The  bias  and  variance  of  the 
estimator  are  included  in  Table  2  for  n  =  10  and  n  =  100. 


Geometric  Mean  Estimator 
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Another  estimator  for  the  mean  of  the  beta  distri- 
bution is  the  geometric  mean  of  the  observations.   The 

n   -  , 
estimator  itself  is   n  X/^,  which  estimates  a/a+6. 

i=l 
The  bias  of  the  estimator  is  easily  computed  directly 

from  the  distribution  of  the  sample. 

^1   rl  . 


n  xV^ 

i=l  ^ 


...   n  x|/^  r  r ( 
Jo   Joi=i    L^ 


a+6) 


yTTBT 


n 

n 

i=l 


ot-l      B-1 

n  X.  (i-x.)    dx^ ...dx 

11       in 


rl 


[ 


r(a+B) 

r(a)r(B) 


rl 


n  a+£-l      B-1 
IT  X.  ^   (1-X. )    dX,  ...dX^ 
i=l  in 


r(a+B)    r(a+(l/n))r  (B) 
r(a)r(B)    r(a+B+(l/n)) 


r  (a+B)r  (g+d/n)) 
r(a)r  (a+B+(l/n)) 


(3.109) 


Thus,  the  bias  in  the  estimator  is 


n 

n  X 

i=l 


l/n_   a 


a+t 


r  (a+B)r  (g+d/n)) 
_  r(g)r(g+B+(l/n)) 


g 
g+f 


(3.110) 


The  variance  can  also  be  computed  directly  from  the 
distribution. 
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Var 


^  1/n 

n  x/ 
i=i  ^ 


=  E 


^   1/n 

n  x/ 
i=i  ^ 


E(  n  x/  ) 
i=l  ^ 


r(a+B)r(a+(2/n)) 
r(a)r(a+3+(2/n)) 


2n 


r  (a+g)r(a+(l/n)) 
_  r  (a)r(a+B+(l/n))  _ 


(3.111) 


Notice  that,  similar  to  the  arithmetic  mean  estimator 
for  the  mean  of  the  distribution,  the  bias  and  variance 
of  the  geometric  mean  are  exact  expressions.   The  bias 
and  variance  are  given  for  various  values  of  a  and  3 
in  Table  3. 


Comparison  of  the  Estimators 

The  final  section  of  this  chapter  will  be  devoted  to 
a  comparison  of  the  three  estimators  of  the  mean  and  the 
two  sets  of  estimators  for  the  parameters,  which  have 
been  considered.   For  this,  the  reader  is  referred  to 
Tables  1,  2,  and  3.   To  determine  what  parameter  values 
to  consider  when  calculating  biases,  variances,  and 
covariance  for  the  estimators,  the  various  types  of  curves 
for  different  choices  of  parameters  in  the  beta  distri- 
bution were  investigated.   A  beta  distribution  with 
parameters  a  =  6  <  1  has  a  U-shaped  distribution. 
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symmetric  about  the  point  X  =  1/2. 

If  0  <  a  <  1,  then 

the  distribution  is  still  U-shaped, 

but  skewed  to  the 

right.   If  a  <  3  <  1,  then  the  distribution  is  again 
U-shaped,  but  skewed  to  the  left.   At  a  =  6  =  1,  the 
distribution  is  identical  to  the  uniform  distribution. 
If  a  >  1  and  6  <  If  the  distribution  increases  with 
increasing  X,  increasing  more  rapidly  the  larger  a  is 
in  comparison  to  g.   If  a  <  1  and  B  >  1,  the  distribution 
is  decreasing  in  X.   If  a  =  6  >  1,  the  distribution  is 
bell-shaped  and  symmetric  about  1/2.   If  a  >  6,  then  the 
distribution  is  skewed  to  the  right.   Finally,  if  a  <  B, 
then  the  distribution  is  skewed  to  the  left.   Thus,  to 
include  all  types  of  curves  represented  by  the  beta 
distribution,  a  range  of  parameters  from  .1  to  10  was 
considered.   More  values  around  1  were  included  since 
that  appears  to  be  a  critical  point  at  which  the  beta 
distribution  changes  shape. 

Two  sets  of  estimators  for  the  parameters  of  the 
beta  distribution,  the  sample  moment  estimators  and 
the  maximum  likelihood  estimators,  were  obtained. 
Through  a  first  order  approximation,  the  sample  moment 
estimators  were  found  to  be  unbiased  to  order  1/n.   For 
the  maximum  likelihood  estimators,  for  a  particular 
value  of  one  parameter,  say  6,  the  bias  in  B  is  fairly 
constant  as  a  varies,  while  the  bias  in  a  increases 
with  increasing  a.   For  example,  for  3  =  .1  and  a  ranging 
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from  .5  to  10,  E(B-6)  ranges  from  -^^^^^   to  jliiM. 

n         n 

For  the  same  parameter  values,  E(a-a)  ranges  from 
2^^301  to  111..5135  ^   For  a  =  B  =  .1,  E(a-a)  =  E(^-Q)    = 

.20503.   The  results  for  other  values  of  6  are  similar 
n 

and  since  the  problem  is  symmetric  in  a  and  B,  similar 

results  would  be  obtained  for  particular  values  of  a. 

Of  course,  the  maximum  likelihood  estimators  are 

asymptotically  unbiased. 

Looking  now  at  the  variances  and  covariance  of  the 

first  order  approximation  for  the  sample  moment  estimators, 

for  a  particular  value  of  B,  and  a  ranging  from  .5  to  10, 

Var(B)  increases  gradually  with  increasing  a.   For  example, 

ata  =  .5  and  B  =  .1,  Var(B)  =  -^^^'^    ,  and  at  a  =  10  and 

n 

B  =  .1,  Var(B)  =  '•'•^^^'^  .   For  the  same  parameter  values 
n  ^ 

Var(a)  increases  much  more  rapidly.   At  a  =  .5  and  B  =  .1, 

Var(a)  =  LiiMi,  and  at  a  =  10  and  B  =  .  1 ,  Var(a)  =  ^212^11. 
n  n 

The  covariance  of  a  and  B  falls  in  between  ranging  from 
jd^ll   at  a  =  .5  and  B  =  .1  to  ^^^^^^    at  a  =  10  and  B  =  . 1. 
The  results  are  similar  for  other  parameter  values. 

For  the  maximum  likelihood  estimators,  the  expected 
mean  squares,  E(a-a)  ,  E(B-B)  ,  and  E(a-a)(B-B)  measure 
the  expected  values  of  the  squares  and  products  of  the 
deviations  of  the  estimators  from  the  true  parameter 
values.   For  the  sample  moment  estimators,  the  variances 
and  covariance  measure  these  deviations  since  the  sample 
moment  estimators  are  unbiased  to  first  order.   Thus,  it 


would  seem  reasonable  to  compare  the  variances  and 
covariance  of  the  sample  moment  estimators  with  the 
expected  mean  squares  and  products  of  the  maximum 
likelihood  estimators.   For  the  parameter  values  consid- 
ered, the  expected  mean  squares  and  products  behave 
quite  similarly  to  the  variances  and  covariance  of  the 
sample  moment  estimators.   For  example,  for  3  =  .1  and 
a  ranging  from  .5  to  10,  E(a-a)   ranges  from  '^^^^^  to 
1013-568^  E(3-3)^  ranges  from  i^lMl   to  i^l^,    and 

E(a-a)  (3-3)  ranges  from  -Q^^^^  to  iiMi.   Two  differences 

n        n 

from  the  sample  moment  estimators  are  apparent.   First, 

^    2 

E(3-3)   decreases  slightly  as  a  increases.   Second,  and 

more  important,  the  expected  mean  squares  and  products 
are  somewhat  smaller  than  the  variances  and  covariance 
for  the  sample  moment  estimators.   The  proportional 
differences  appear  to  be  greatest  for  small  parameter 
values.   For  example,  at  a  =  .5  and  3  =  .1, 


Var(a)  =  1.4159       E(a-a)^  =  .85955 


Var(3)  =   .0497      E{3-3)^  =  .01141 


Cov(a,3)  =  .1534      E(a-a)(3-3)  =  .03196    (3.112) 
n  n 
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and  at  a  =  5  and  6=2, 


Var(a)  =  56.5833      E(a-a)^  =  50.507 


Var(6)  =  8.2367       E(6-B)^  =  6.9664 


Cov(a,6)  =  18.55      E(a-a)(6-6)  =  15.782.  (3.113) 
n  n 


Thus,  the  maximum  likelihood  estimators  are  somewhat  more 
exact  than  the  sample  moment  estimators. 

Asymptotically,  Var(a),  Var(B),  and  Cov(a,S)  are 

^    2 
identical  to  the  first  order  approximations  of  E(a-a)  , 

E(3-6)  ,  and  E(a-a)(B-6).   Hence,  the  maximum  likelihood 

estimators  are  also  more  exact  estimators  asymptotically 

than  the  sample  moment  estimators. 

For  the  mean  of  the  beta  distribution,  the  sample 

moment  estimator  is  unbiased.   If  a  =  B,  the  first  order 

approximation  of  the  maximum  likelihood  estimator  is 

also  unbiased.   For  a  >  B  and  a  particular  value  of  B, 

the  bias  is  sometimes  a  decreasing  function  of  a,  and 

sometimes  an  increasing  function  of  a.   For  example, 

for  B  =  .1,  E(a/a+B)  ranges  from  .2003/n  to  .01455/n  as 

a  increases  from  .5  to  10.   For  B  =  .5,  E(a/a+B)  increases 

from  .1107/n  to  .1418/n  and  then  decreases  to  .0574/n 

as  a  ranges  from  1  to  10.   For  a  <  B,  the  bias  is  simply 

the  negative  of  the  bias  for  the  same  combination  with 
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a  >  3.   For  example,  for  a  =  .5  and  B  =  .1,  E(a/a+3)  = 
.2003/n  and  for  a  =  .1  and  3  =  .5,  E(a/a+3)  =  -.2003/n. 
For  estimating  the  mean,  a  third  estimator,  the  geometric 
mean,  was  considered.   Since  the  expected  value  and 
variance  of  the  geometric  mean  are  complicated  functions 
of  n,  they  were  evaluated  for  various  values  of  n.   The 
biases  are  consistently  negative  and  larger  in  absolute 
value  than  the  biases  for  the  first  order  approximation 
to  the  maximum  likelihood  estimators.   For  example,  for 
a  =  5  and  3=2  and  n  =  10,  the  bias  is  -.01892  for  the 
geometric  mean  and  .01424  for  the  first  order  approximation 
to  the  maximum  likelihood  estimator. 

The  variance  of  the  sample  moment  estimator  of  the 
mean  decreases  as  a  increases  for  a  particular  value 
of  3.   For  example,  for  3  =  .1  and  a  ranging  from  .5  to 
10,  Var{a/a+3)  ranges  from  .08681/n  to  .00088/n. 
A  similar  relation  holds  for  the  variance  of  the  first 
order  approximation  of  the  maximum  likelihood  estimator. 
For  3  =  .1  and  a  ranging  from  .5  to  10  Var(a/a+3)  ranges 
from  .04312/n  to  .0007  9/n.   As  one  can  see,  the  maximum 
likelihood  estimator  has  smaller  variance  than  the  sample 
moment  estimator.   These  results  are  typical  of  other 
parameter  combinations.   For  almost  all  parameter  values, 
the  geometric  mean  estimator  has  a  larger  variance  than 
either  the  maximum  likelihood  estimator  or  the  sample 
moment  estimator. 
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As  a  check  on  the  comparison  between  the  sample  moment 
estimator  of  the  mean  and  the  maximiom  likelihood  estimator 
of  the  mean,  some  simulation  work  was  done.   From  a  beta 
distribution  with  parameters  a  =  3  and  g  =5,  100  samples 
of  size  5  and  100  samples  of  size  20  were  generated 
on  a  computer.   The  two  estimators  of  the  mean  were  then 
calculated  for  each  sample.   Finally,  to  compare  the 
estimators,  the  mean  square  error  for  each  estimator 
for  the  samples  of  size  5  and  the  samples  of  size  20 
was  calculated.   Let  M^  be  the  sample  moment  estimator 
of  the  mean.   The  mean  square  error  for  the  sample 
moment  estimator  is  then 


100 

ifl  ^^1  -  ^)'  /  10°  •  (3.114) 


^      2 


If  a/a+6  is  the  maximum  likelihood  estimator  of  the  mean, 
then  its  mean  square  error  is 

100     -         _ 

.^   (  777-  -  -7-  )   /  100  .  (3.115) 

1=1    a+6    a+B 

For  samples  of  size  5  from  a  beta  distribution  with 
parameters  a  =  3  and  6=5,  the  mean  square  error  for  the 
sample  moment  estimator  was  .0070148,  and  the  mean  square 
error  for  the  maximum  likelihood  estimator  was  .0056575, 
For  samples  of  size  20,'  the  mean  square  error  for  the  moment 
estimator  was  .0010910,  and  the  mean  square  error  for  the 
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maximum  likelihood  estimator  was  .0010896.   Thus,  in 
both  cases  the  maximum  likelihood  estimator  more  closely 
estimated  the  mean  of  the  distribution,  although  the 
difference  was  very  small  for  samples  of  size  20. 

Thus,  the  choice  of  an  estimator  for  either  the 
parameters  or  the  mean  of  the  beta  distribution  appears 
to  depend  on  the  characteristics  one  desires  in  the 
estimator.   If  one  wishes  to  have  unbiased  estimators, 
at  least  to  order  1/n,  then  one  should  use  the  sample 
moment  estimators.   If  one  desires  smaller  variance 
or  expected  mean  square  in  the  estimators  and  can 
tolerate  some  bias,  which  decreases  with  n,  then  one 
should  use  the  maximum  likelihood  estimators.   However, 
the  savings  in  expected  mean  squares  may  not  be  enough 
to  offset  the  difficulty  in  obtaining  the  maximum 
likelihood  estimators.   If  that  is  the  case,  then  the 
sample  moment  estimators  provide  easy  to  calculate, 
unbiased  estimators  for  the  parameters  and  the  mean 
of  the  beta  distribution. 
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CHAPTER  4 
HYPOTHESIS  TESTING  FOR  THE  BETA  DISTRIBUTION 

Introduction 

As  in  the  estimation  problem,  since  the  Dirichlet 
or  generalized  Dirichlet  distribution  is  so  closely- 
related  to  the  beta  distribution,  the  problem  of  developing 
a  test  of  hypothesis  was  considered  for  the  beta  distribution, 
In  this  chapter,  tests  will  be  developed  for  two  cases. 
First,  a  test  about  one  parameter  assuming  the  other 
is  known,  and  second,  a  test  about  one  parameter  leaving 
the  other  unspecified  will  be  given.   Since  the  beta 
distribution  is  symmetric  in  its  parameters,  a  test 
about  one  parameter,  say  a,    could  easily  be  adapted  to 
test  the  other  parameter,  g.   Thus,  all  tests  will  be 
constructed  on  the  parameter  a-   Finally.,  it  will  be 
shown  that  the  test  for  a  leaving  g  arbitrary  can  be 
adapted  to  test  an  hypothesis  about  the  mean  of  the 
beta  distribution,  a/a+B- 

The  types  of  hypotheses  to  be  considered  are  generally 
known  as  one-sided  and  two-sided  hypotheses.   It  is  desired 
to  test  either  that  a  is  less  than  some  constant,  a  is 
greater  than  some  constant,  or  a  is  equal  to  some 
constant,  both  fixing  and  not  fixing  6.   It  is  further 
desired  to  find  the  best  possible  test  for  each  of  these 
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hypotheses.   Fortunately,  because  the  beta  distribution 
is  a  member  of  the  exponential  class  of  distributions, 
such  a  best  test  can  be  found. 

One-sided  Tests  for  g  when  g  Is  Known 

In  this  section,  one-sided  tests  of  hypotheses  about 
a  when  B  is  specified  will  be  developed.   From 
Ferguson  (1967),  consider  the  following  definitions. 

Definition  4.1:A  test  cj)  of  Hq  :  QeQ^   against  H^ :  deQ^ 

is  said  to  have  size  a  if  sup  E  (j)  (X)  =  a. 

QeQ,    9 

Definition  4.2;  A  test  4)q  is  said  to  be  uniformly 

most  powerful  (UMP)  of  size  a  for  testing 

Hq:  ee0Q  against  H^ :  GeG^  if  (})q  is  of  size  a 

and  if  for  any  other  test  (j)  of  size  at  most  a 

Eg<},Q(X)  >  Eg(()(X)  for  each  QeQ^. 

For  the  beta  distribution,  a  typical  hypothesis  to  be 

considered  is,  H^ :  a^a^  against  H  :  a<a  .   Now,  consider 

the  following  theorem  from  Ferguson. 

Theorem  4.1;  If  the  distribution  of  X  has  monotone 

likelihood  ratio,  any  test  of  the  form 


(X)  =  I 


1  if  X  >  X 

Y  if  X  =  Xq      (5.24) 

0  if  X  <  Xq 


has  nondecreasing  power  function.   Any  test  of 
the  form  (5.24)  is  UMP  of  its  size  for  testing 
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^0*  ®-^0  ^^ainst  H^ :  e>e   for  any  SqEQ,  provided 

its  size  is  not  zero.   For  every  0<a<l  and  every 

QqCQ,    there  exist  numbers  -'=°±Xq±°°   and  0<y<l 

such  that  the  test  (5.24)  is  UMP  of  size  a, 

for  testing  H. :  e<e-  against  H, :  e>e„. 
0—0  1     0 

A  similar  statement  would  hold  for  testing  the  hypothesis 

^0"  ®-®0  ^9^^"^^  ^1=  ^"^^0*  ^°^   ^^^  proof  of  this  theorem, 
the  reader  is  referred  to  Ferguson  (1967)  .   Since  the  beta 
distribution  is  a  continuous  distribution,  y   may  be 
taken  as  0.   In  the  notation  of  Ferguson,  the  test  is 
given  as  a  function  (})  (X)  which  takes  values  either  0  or  1. 
If  4)  (X)  is  1,  then  the  null  hypothesis  is  rejected,  while 
if  (})  CX)  is  0,  the  null  hypothesis  is  not  rejected.   It 
should  be  noted  here  that  the  X  Ferguson  refers  to  is  a 
sufficient  statistic  for  the  parameter  in  question. 
The  problem  is  always  reduced  from  a  sample  of  observations 
to  a  sufficient  statistic  or  statistics. 

Consider  now  the  following  hypothesis,  H^ :  a>_a.  against 
Hj^:  a<aQ  with  B  specified.   By  the  theorem  from  Ferguson 
a  UMP  test  for  this  hypothesis  exists  and  is  of  the  form 


(t)  = 


1   if   t  ■<  c 

(4.1) 
0   if   t  >  c 


where  t  is  the  sufficient  statistic  for  a.   Note,  for  this 
hypothesis,  since  B  is  specified,  the  distribution  contains 
only  one  parameter,  a.   Thus,  there  is  one  sufficient 
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statistic  for  a.   To  determine  the  test  completely,  a 
sufficient  statistic,  t,  must  be  found  and  the  constant, 
c,  determined  so  that  the  test  is  of  a  given  size,  say 
.05.,  Since  the  beta  distribution  with  specified  3 
parameter  is  a  member  of  the  exponential  class  of  distri- 
butions, the  sufficient  statistic  for  a  is  easily  found. 


f  (X,a)  =   r(a+B)    x"""-^  (l-X)^"-^ 
r(a)r(6) 


(4.2) 


and 


L(X^,...,X^,a)  = 


n  n 


r (g+B)  ' 

_r(a)r(6) 


n  x""^i-x.)^"^ 
i=i  ^ 


r(a+3) 


_r(a)r(6) 


exp{a  In  II  X.  } 
i=l  ^ 


n  n 

exp{-ln  n  X.  +  (B-l)ln  n  (1-X . ) } .  (4.3) 
i=l  i=l    ^ 


Thus,  In  n  X.,  or  equivalently  n  X.,  is  a  sufficient 

i=l  ""  i=l  "- 

statistic  for  a.   Thus,  the  test  is  of  the  form 


(t)  = 


1  if   n  X.  <  c 
i=i  ^  - 


0  if   n  Xj^  >  c 

i=l 


(4.4) 


The  problem  remains  to  find  the  constant,  c,  so  that  the 
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test  is  of  a  given  size.   Assume  a  test  of  size  .05  is 

desired.   Then,  c  can  be  determined  from  the  following 

n 
probability  statement,  Pr  [   II  X.  £  c]  =  .05.   To  evaluate 

i=l  ^  n 

the  constant  some  knowledge  of  the  distribution  of   II  X. 

n      i=l  ^ 
or  an  approximation  to  the  distribution  of   n  X.  or  a 

n  i=l  ^ 

function  of   II  X^  is  necessary.   Since  an  exact  distribution 


i=l 
IS  n 

were  considered. 


for   n  X.  was  not  obtainable,  several  approximations 
i=l 


Normal  Approximation 

The  first  approximation  considered  was  a  normal 

n 
approximation.   The  probability  statement  Pr[   H  X-  <  c  ]  = 

n  i=l 

.05  is  equivalent  to,  Pr [ (  Z  In  X. ) /n  <  (In  c)/n]  =  .05. 

n  i=l    ^    - 

Since  (  I    In  X. )/n  is  the  mean  of  the  In  X.'s,  then 

i=l    ^  n    ^ 

asymptotically  the  distribution  of  (  E  In  X . ) /n  will  be 

i=l     ""  9 
normal  with  some  mean,  y,  and  some  variance  ,a'^/n.   Thus, 


Pr 


n  In  X. 
i=l 


In  c 


n   —  n 


=  .05 


(4.5) 


is  equivalent  to 


Pr 


In  c 


Z  <   n 


a//n 


=    .05 


C4.6) 


where  Z  is  a  standard  normal  random  variable.   The 
probability  can  then  be  evaluated  from  standard  normal 
tables  and  the  constant,  c,  can  be  found.   From  standard 
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normal  tables,  the  above  probability  statement  implies 


In  c 


-  y 


a//n 


=  -1.645 


(4.7) 


or 


In  c  =  [  -1.645  a//n  +  y  ]n 


(4.8) 


Thus, 


c  =  exp{ [  -1.645  a//n  +  y]n} 


=  exp  [  -1.645  /na   +  ny]  . 


(4.9) 


All  that  is  needed  now  is  to  evaluate  u  and  a    . 


y  =  E 


E  In  X. 
i=l 


=  E(ln  X. ) 
1 


=  4'(a)  -  y(a+3)     and 


a^  =  Vardn  X^)  =  m'   (a)  -  ^'  (a+3) 


(4,10) 


from  results  of  Chapter  3.   Thus,  (  I  In  X.)/n  is 

i=l    ^ 
asymptotically  distributed  as  N{  ['I' (a) -^  (a+6)  ]  ,  [^  '  (a) ->!"  (a+B)/n]  } 

The  constant,  c,  is  given  for  various  combinations  of  a, 

B,  and  n  in  Table  4.   V^here  c  was  too  small  to  be 


91 


evaluated,  In  c  is  given.   For  those  values,  the  test 
could  be  written  as 


(t)  =  J 


1   if    Z  In  X.  <  In  c 
i=l 


0   if    E  In  X.  >  In  c 


i=l 


(4.11) 


For  the  hypothesis  Hq  :  OL<a.Q   against  H,  :  a>a-,  the 
UMP  test  would  be  of  the  form 


<}>(t)  = 


1  if   n  X.  >  c 

i=l  ^ 


0   if    IT   X.  <  c 


i=l 


(4.12) 


The  probability,  Pr [   n  X.  >  c  ]  =  .05,  could  again  be 

i=l  ^ 
evaluated  by  the  normal  approximation  to  find  an 

approximation  to  c. 


Beta  Approximation 

To  obtain  an  approximation  to  the  distribution  of 
n 

n  X.  directly,  three  methods  were  attempted.   First, 
i=l  n 

the  first  two  moments  of   II  X.  were  equated  to  the  first 

i=l 
two  moments  of  a  beta  random  variable.   This  is  intuitively 

appealing  since  the  beta  distribution  takes  on  a  wide 

variety  of  forms  for  various  choices  of  its  parameters. 

n 
Since   n  X.  lies  on  the  interval  (0,1),  it  will  most 

i=l 
probably  be  adequately  approximated  by  some  beta  distribution. 
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From  the  two  moment  equations,  the  unknown  parameters  of 
an  approximating  beta  random  variable  can  be  determined. 
Assume 

f(X  ,a)  =   r(a+g)  X^~-^{l-X.)^~^      i=l,...,n  .      (4.13) 
^      r(a)r(3)   ^      1 


Let 


m^  =  E(  _n  X^  )  =  (a/a+B)^   and 


n 

n 

i=l 
n 


m2  =  E(   n  X   )  =  [a(a+l)/(a+6)  (a+3+1)]^  .         (4.14) 
i=l 


Let  Y  be  a  beta  random  variable  with  parameters  p  and  q. 
Then 


E(Y)  =  p/p+q    and 

E(Y^)  =  p(p+l)/(p+q)  (p+q+i)  .  (4.15) 


Equating  the  moments, 

m  =  p/p+q    and 

m2  =  p(p+l)/(p+q)  (p+q+1)   ,  (4.16) 


Solving  for  p  and  q, 
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p  =  "^l^"^r"^2)   and   q  =  (l-^)  (^-^) 


m^-m^ 


"^2~^1 


(4.17) 


Thus,  for  testing  H  :  a>_a   against  H  :  a<a  ,  the  test 
is  of  the  form 


(t)  = 


1  if  t  £  c 
0   if   t  >  c 


(4.18) 


c  is  found  as  before  from  the  probability  statement, 

n 
PrI   n  X^  <_  c  ]  =  .05,  which  for  this  approximation  is 

i=l 
approximately  equivalent  to,  Pr[  Y  <  c  ]  =  .05,  or 


-C 


r  (p+q) 


7^ 


yP-^i-Y)^-^  dY  = 


.05 


q) 


(4.19) 


This  integral  was  evaluated  by  a  numerical  integration 
program,  which  calculates  areas  under  curves.   In  Table  5 
are  given,  for  various  values  of  a,  3,  and  n,  the 
parameters  p  and  q  of  the  approximating  beta  random 
variable  and  the  value  of  c  for  a  .05  size  test.   For  some 
of  the  larger  sample  sizes,  the  moments  m,  and  m„  were 
not  computationally  different  from  zero  and,  hence,  p,  q, 
and  c  could  not  be  determined.   Even  when  p  and  q  could 
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be  found,  it  was  not  possible  to  find  c  when  either 
parameter  was  very  small  or  very  large.   For  a  large 
parameter  value,  the  program  could  not  evaluate  the 
gamma  functions  of  the  integral.   For  a  small  parameter 
value,  the  distribution  is  concentrated  too  close  to 
0  or  1  for  the  program  to  obtain  a  value  for  c. 

A  test  of  the  hypothesis  Hq  :  a<_aQ  against  H^ :  a>aQ 
with  3  specified,  is  of  the  form 


(   n  X.  )  =  - 


i=l 


1   if   n  X-  >  c 
i=l   ~ 


0  if   n  X.  <  c 
i=l  ^ 


(4.20) 


To  evaluate  Pr [   n  X.  >  c  ]  =  .05,  again  approximate 

i=l   n 
the  distribution  of   11  X .  by  a  beta  random  variable, 

i=l  ^ 
Y,  and  note  that 


05  =  Pr(   n  X.  >  c  ) 
i=l  ^  ~ 


-  Pr(  Y  >  c  ) 


^  1-Pr[  (  Y  <  c  )] 


=  1-[1-Pr  (  V  <  1-c  )] 


=  Pr{  V  <  1-c)  , 


(4.21) 


where  V  has  a  beta  distribution  with  parameters  q  and  p, 
that  is,  the  parameters  are  reversed  from  the  distribution 
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of  Y.   Thus,  1-c  could  be  found  by  the  previous  method 

and  then  c  would  be  determined. 

A  comparison  of  the  values  of  c,  where  available, 

in  Table  5  with  those  in  Table  4  gives  some  idea  of  how 

this  approximation  compares  to  the  normal  approximation, 

for  testing  H^ :  a>aQ  against  H^ :  a<a  .   It  should  be  noted 

that  in  all  cases  the  value  of  c  obtained  for  the  normal 

approximation  is  smaller  than  that  for  the  moment 

approximation.   Hence,  the  rejection  region  is  larger 

for  the  moment  approximation  than  for  the  normal 

approximation.   However,  as  the  parameters  of  the  X.'s, 

1 

a  and  g,  increase  and  as  n  increases,  the  values  of  c 

for  the  two  methods  become  very  close.   For  example, 

at  a=.5,  B=.l,  and  n=5,  c  for  the  beta  approximation 

is  .0004857  and  for  the  normal  approximation  c  is 

.001825.   At  a=10,  6=.l,  and  n=5 ,  c  for  the  beta 

approximation  is  .818  and  for  the  normal  approximation 

c  is  .840,   At  a=10,  B=.l,  and  n=20,  c  for  the  beta 

approximation  is  .609  and  for  the  normal  approximation 

c  is  .636.   Some  further  study  is  needed  to  determine 

which  approximation  works  better  in  the  cases  where 

the  two  values  of  c  are  much  different. 

Two  other  moment  approximations  were  considered, 
n 

Since   n  X.  is  a  product  of  n  beta  random  variables, 

i=l  n 

the  first  two  moments  of   IT  X^  were  equated  to  the  n^^ 

th  "'"""'" 

and  2n   moments  of  a  beta  random  variable.   Also,  the 

n   ^  , 
first  two  moments  of   n  x/   were  equated  to  the  first 

i=l 
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two  moments  of  a  beta  random  variable.   However,  neither 

of  these  methods  gave  a  reasonable  approximation  for 

n 
the  distribution  of   EX-.   For  example,  for  a=2,  3=.5, 

1=1 
and  n=5,  the  first  method  gives  an  approximating 

distribution  with  parameters  p=7.3  and  q=2.5.   This 

distribution  has  its  mean  at  about  .75.   However,  for 

n 
the  distribution  of   II  X .  ,  one  would  expect  a  mean  much 

i=l  ^ 
smaller  since  the  mean  for  each  X.  is  only  .8.   For  the 

second  method,  at  a=B=.l  and  n=5  the  approximating 

distribution  has  parameters  p=4.17  and  q=.53.   This 

distribution  has  a  mean  of  about  .9.   But  by  the  form 

of  the  distribution  of  each  X. ,  one  would  again  expect 
n 

n  X.  to  have  a  far  different  distribution.   Hence, 
i=l 
neither  of  these  methods  of  approximating  the  distribution 

n 
of   IT  X.  seems  to  work  well. 
■i=l 

Two-sided  Tests  for  a  When  g  Is  Known 

For  a  two-sided  hypothesis  of  the  form  H- :  6=8 
against  H-j^:  QJ=Qq   a  UMP  test  does  not  exist.   However, 
a  UMP  unbiased  test  does  exist.   Consider  the  following 
definition  and  theorem  from  Ferguson  (1967)  . 

Definition  4.3;  A  size  a  test  4)  of  H  :  See   against 
H-j^:  GeG-j^  is  said  to  be  unbiased  if  Egtj)  (X)  _>  a 
for  all  ee0  . 


97 


Theorem  4.2:  Suppose  the  distribution  of  the  random 

observable  X  has  the  form  (5.34)  [the  exponential 
class  of  distributions] .   Then,  given  any  test 
(j)  and  BqcQ   there  exists  a  two-sided  test  4)' 
in  the  form 


'  1   if   X  <  X   or  X  >  X 


(j)'(X)  =  J  Yj^  if   X  =  x^,    i=l,2 


0   if   X   <  X  <  X 


for  which  Eg  4)'  (X)  =  Eq  <^  (X)    and 
0  0 


d   E(|)'(X) 

3e  ® 


=  d_  E„4)(X) 

de  " 


Moreover,  for  any  such  two-sided  test  (f) '  and 

for  all  ee0  E^(J)'(X)  >  E^(J)  (X)  . 
9       —   9 

For  testing  H^ :  9=9^  against  H^ :  9=[=9   Ferguson  shows  that 


the  two  conditions  Eq  (t>'  (X) 

O  0 


Eo  (J)  (X)  and  d   En*'  (X) 
^o  d9   ^ 

d_  E  (J)(X)       are  equivalent  to  E   4)(X)  =  .05  and 
d9   0      9=9o  Qq 

E  X<})(X)  =  aE  X.   Thus,  out  of  all  the  tests  which  are 

D  0  0  0 

unbiased,  the  two-sided  test  given  in  the  theorem  is 
uniformly  most  powerful  for  testing  a  two-sided  hypothesis. 
Consider  now  the  hypothesis  Hq  :  a=aQ  against  H-|_ :  a+ag- 
A  UMP  unbiased  test  for  this  hypothesis  is  then  of  the  form 


(t)  = 


'  1   if   t  <  t   or   t  >  t2 
0   if   tj^  £  t  <  t2 


(4.22) 
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where  t  is  again  a  sufficient  statistic  for  a,  t  and 
t2  are  constants  to  be  found,  and  y^  from  the  theorem 
may  be  taken  to  be  0  as  before.   Thus,  it  is  necessary 
to  evaluate  Pr[  t  <  t  or  t  >  t^  ]  =  .05. 

Normal  Approximation 

Using  the  normal  approximation  again,  the  probability 
statement  Pr  [  t  <  t-j^  or  t  >  t2  ]  is  equivalent  to 

n 
PrKln  t  )/n  <  (  Z  In  X.)/n  <  (In  t^)/n]  =  .95 
-L       i=l    1    -      ^ 


Pr 


In  t 


1  -  y 


In  t 


2  -  y 


<  Z  <    n 


_   a/^n 


a//n 


=  .95 


(4.23) 


t   and  t  may  now  be  found  approximately  from  the 
standard  normal  distribution.   The  second  condition  of 
Theorem  4.2  implies,  for  the  normal  distribution,  that 
t  =  -t   and  thus  for  a  .05  size  test 


In  t 


1  -  y 


In  t. 


=   -1.96  and 


2  -  y 


=  1.96 


(4.24) 


o//n 


a//n 


Thus,  approximations  to  t   and  t  may  be  found  and  the 
test  is  fully  determined. 
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Beta  Approximation 

A  test  of  the  hypothesis  H  :  a=a   against  H  :  a^a.^ 
with  3  specified  is  of  the  form 


(  n  X. )  =  \ 
i=l  ^ 


n 

n 

i=l 


1   if    n  X.  <  t,  or   II  X.  >  t, 

i=l  1     ^ 


0  if  t,  <  n  X.  <  t^ 

1  -  i=l  ^  -   2 


(4.25) 


For  this  test  to  be  UMP  unbiased  the  two  conditions 
given  previously  must  be  met. 


E„  [4>(  n  X.  )]  = 
"o   i=l  "■ 


.05   and 


n       n 
E   [(  n  X.  )c^(  n  X.)]  = 
'^O   i=l  "■   i=l  ^ 


.05  E   (  n  X  ) 

"'o  i=l  ^ 


(4.26) 


These  two  conditions  allow  determination  of  t^  and  t-, 

As  before,  an  approximating  beta  random  variable,  Y, 

n 
will  be  used  in  place  of   EX..   Thus, 

i=l  ^ 


05  =  E,^  [(!)(¥)]    or 


ft- 


95  = 


r  (p+q)   yP'H-Y)^"-'-  dY 


r(p)r(q) 


(4.27) 


100 


and 


.05  E   (Y)  =  E   [Y  (j)(Y)]   or 
"O        0 


.95  p  = 
p+q 


^2 

P         q-1 
r  (p+q)   Y^    (1-Y)^    dY         (4.28) 

r(p)r(q) 
^1 


The  solution  of  these  two  equations  for  t,  and  t„  would 

^  12 

be  quite  difficult  even  with  the  aid  of  a  computer,  and 
hence,  their  solution  for  various  values  of  a,  B,  and  n 
has  not  been  attempted. 

One-sided  Tests  for  a  When  g  Is  Unknown 

As  in  the  case  when  B  was  specified,  hypotheses  about 
a  when  g  is  unspecified  may  be  tested.   Forms  for  the  test 
in  such  cases  are  also  given  in  Ferguson  (1967) .   Consider 
the  following  general  testing  situation:  given  a  k-parameter 
exponential  class  distribution  with  parameters  (6  ,...,G  ) 

-L         JC 

find  a  UMP  unbiased  test  for  Hq  :  9-iie?  against  H.  :  0,>e°. 
If  the  problem  is  first  reduced  to  sufficient  statistics 
(t-|^,  .  .  .  ,tj^)  ,  then  Ferguson  shows  that  a  UMP  unbiased 
test  for  the  above  hypothesis  conditional  on  (t2/.../t,  ) 
is  of  the  form 


.0(t^,...,t3^)  = 


if  t^>z(t2,...,tj^) 


Y  (t2,  .  .  .  ,tj^)  if  t-L=z  (t2,  .  .  .  ,tj^) 


if  t^<z(t2,...,tj^)  (4.29) 


101 


where  again  by  the  nature  of  the  beta  distribution  y 
may  be  taken  as  zero  and  z  is  determined  so  that  the 
test  is  of  a  given  size.   Ferguson  then  shows  that  this 
same  test  is  also  UMP  unbiased  for  the  unconditional 
problem,  that  is,  testing  Hq  :  Q^ieJ  against  H^:    e-|_>e^. 
A  similar  test  would  exist  for  testing  Hq  :  Q-,^Q\    against 


^1-   1  "1 


<e. .   It  would  have  the  form 


(tT,...,tv)  = 


0   if  t^   >    z(t2,...,tj^) 


(4.30) 


Consider  now  the  hypothesis  for  the  a  parameter  of 
the  beta  distribution,  Hq  :  a>jXQ    against  H-,  :  a<aQ,  with 
3  unspecified.   The  test  would  be  of  the  form 


(ti,t2)  = 


1   if   tj^  <_  z  {t2) 
0   if   t^  >  z(t2) 


(4.31) 


The  problem  now  is  to  find  joint  sufficient  statistics 
^^l'^2^  for  the  parameters  of  the  beta  distribution  and 
then  to  determine  z(t2)  so  that  the  test  is  of  a  given 
size.   Joint  sufficient  statistics  are  easily  determined 
since  the  beta  distribution  is  a  member  of  the  exponential 
class. 


f  (X,a,6)  =   r  (g+B) 

r(a)r(3) 


x°'-^i-x)^-i 


(4.32) 
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L(X,  ,...,X^,a,B)    = 


r  r  (g+B)  " 

Lr(a)r($)_ 


n  n 


a-l      B_i 

n  X.   (i-xj  ^ 
i=i  "■     1 


r  r  (g+B)  1 

Lr(a)r(B)J 


exp{gln  H  X.+Bln  K  (1-X.)} 


l=x 


i=l 


exp{-ln  n  X.  -  In  n  (1-X.)}. 
i=l        i=l    ^ 


(4.33) 


Thus,  (In   n  X. ,  In   K  (1-X.))  or  equivalently  (  n  X.,  H  (1-X.)) 

i=l       i=l    ^  1=1  ^  i=l    ^ 

are  joint  sufficient  statistics  for  {a,B)'      Hence, 


the  test  takes  the  form 


(  n  X. )  =  I 
i=l 


1  if   n  X-  <  z(  n  (1-X. ) ) 
i=l       i=l   ^ 

n         n 

0  if   n  X.  >  z (  n (i-x- ) )  . 

i=l       i=l 


(4.34) 


The  problem  then  remains  to  find  z(  n  (1-X.)).   Recall 

i=l    ^ 
that  the  test  is  exactly  the  same  for  the  unconditional 

problem  as  it  is  for  the  conditional  problem,  fixing 
n  n 

n  (1-X.).   In  the  conditional  problem,  z(  n  (1-X.)) 
i=l    ^  i=l    ^ 

is  simply  a  constant.   Thus,  to  determine  the  exact 

form  of  the  test,  the  following  probability  statement 

n        n 
must  be  evaluated,  Pr  [   n  X.<c|   II  (1-X.)=t]  =  .05. 

i=l       i=l 
A  technique  for  evaluating  this  probability  statement 

will  be  given  in  general  for  a  sample  (X-j^ , .  .  .  ,X  )  . 

However,  to  fix  the  ideas  of  the  method  for  the  general 
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case,  consider  a  sample  of  size  two,  (X  ,X  ) .   Refer  to 
Figures  1  and  2.   It  is  desired  to  find  c  such  that 
Pr[Xj^X2<c|  (1-X^)  (1-X2)=t]  =  .05.   The  joint  distribution 
of  X,  and  Xy    is 


f(X^,X2)  = 


r  r(a+B) 

Lr(a)r(B)_ 


{X^X2)°'  ■'•[  (1-X^)  (1-X2)]^'-'-.  (4.35) 


Transforming  X^  and  X2  to  the  order  statistics  Z,  =  X,, 


(1) 


and  Z2  =  ^(2) 


g(Z  ,Z  )  =  2r  r  (g+B)  ' 

Lr(a)r(3). 


(Z^Z2)°'~"^[(1-Z^)  (1-Z2)]^~"'".  (4.36) 


Next,  transform  Z^  and  Z2  into  Y,  =  (1-Z, )  (I-Z2)  and 


Yj  =  Z2.   Then 


h(Yj^,Y2)  =  2 


"  r (g+B)  " 

r(g)rCB) 


1-Y  -Y 
2   1 


1-Y 


2  J 


g-1 


g-1  3-1 
Y^   Y,    1 (4.37) 

r=Yr 


since  Z^^  =  1-CY^/1-Y2)  and  Z2  =  Y2  implies  the  Jacobian 
of  the  transformation  is  J  =  I/I-Y2.   Now,  the  probability 
statement,  Pr rx^X2<c |  (1-X^)  (I-X2) =t]  =  .05  can  be 
transformed  into  the  (Y^ ,Y2 ) -space  and  evaluated  in  that 
space.   First  in  the  (Z^, Z2) -space ,  the  probability 
statement  is  equivalent  to  Pr  [Z-j^Z2<c  |  (1-Z-j^)  (1-Z2)=t]  =  .05 
where  0<Z  <Z2.   As  can  be  seen  from  Figure  1,  this 
probability  statement  can  be  evaluated  by  first  integrating 
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1-t 


1-t        1 


(1)     2.^=0 


(4)     Z^Z2=c 


(2)  Z^=Z2 

(3)  (1-Z^)  (1-Z2)=t 


Figure   1 


(3)    Y3^=t 


(1)    Y2=l-Yi 


(4)    Y2(1-(Y-l/1-Y2))=c 
(2)    Y2=1-/Y3^ 


Figure    2 
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along  the  curve  (1-Z-j^)  (1-Z2)=t  from  Z.j^=0  to  1i=l2   ^^i^ 
dividing  the  joint  density,  g{Z-|^,Z2),  by  the  result  of 
that  integration  to  obtain  the  conditional  density, 
^1^^1^2l  (^"Zj^)  {1-Z2)=t)  ,  and  then  integrating  g-j^  along 
the  curve  (l-Z^^^)  (1-Z2)=t  from  Z^=0  to  Z  Z  =c.   In  Figure  2 
the  lines  and  curves  are  shown  in  the  (Y-i  ,¥2) -space. 
Here  the  probability  statement  may  be  evaluated  by  inte- 
grating along  the  line  Y-L=t,  first  from  Y2=1-/Y3_  to 
Y2=l-Yj^  to  find  the  conditional  density  h-,  (Y2 1  Y,=t)  , 
and  then  from  Y2=1-/y^  to  Y2  (1- (Y-,^/l-Y2)  )=c.   Since 
the  integrations  are  much  more  easily  performed  in  the 
(Y^,Y2) -space,  the  probability  statement  will  be 
evaluated  in  that  space.   The  first  integration  is 

,1-t 
K  =        h(Y-L,Y2)  dY2  .  (4.38) 

il-/t 

Hence,  the  conditional  density,  h^(Y-^|Y2=t)  is  h{Y^,Y2)/K. 
Because  of  the  nature  of  h(Y,,Y2),  the  integral  cannot 
be  evaluated  explicitly,  but  must  be  approximated  by  a 
method  of  numerical  integration.   However,  many  good 
methods  of  numerical  intergration  are  available  and  very 
close  approximations  should  be  attainable.   From  the 
conditional  density  h  (Y  |y  =t),  the  probability 
statement,  Pr  [Xj^X2<c  |  (1-X^)  (I-X2)  =t]  =  .05  can  be 
evaluated.   It  is  an  integral  from  the  curve  Y2=l-v^-, 
to  the  curve  Y2  (1- (Yj^/1-Y2)  )=c  along  the  line  Y-,=t. 
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The  intersection  of  Y2  (1- (Y-]^/l-Y2)  )=c  ^^^  ^1=^  ^^ 


Y2(l-(t/l-Y2))  =  c  or 


v^ 2 

Y  =  (1+c-t)  +   (t-c-1)  -4c 
2  o 


(4.39) 


Letting 


C  = 


(1+c-t)  +   (t-c-1) ^-4c  , 
2 


(4.40) 


then 


c  =  c'  -  c't  -  (c') 

j-^T 


(4.41) 


Hence,  the  probability  statement,  Pr[X,X2<c| (1-X,) (1-X2)=t] 
=  .05,  is  equivalent  to 


1-t 


c' 


^(^l'^2) 


dY2  =  .05  . 


(4.42) 


This  integral  must  be  evaluated  for  c'  by  a  method  of 
numerical  integration.   Thus,  the  test  of  the  hypothesis 
H  :  OL>a.      against  H  :  a<a   is  completely  determined  as 


<^{X^,X2)    =   ' 


'1^21 
0   if   Xj^X2>  c 


(4.43) 
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For  a  sample  of  size  three,  (X^,X2,Xj) ,    the  procedure 
is  the  same,  but  the  limits  of  integration  to  find  K 
and  finally  to  evaluate  the  probability  statement  are  a 
bit  more  difficult  to  obtain.   Again,  the  transformation 
will  be  made  from  iX^,X2'^3^    ^°    (Z^,Z2,Z3),  the  order 
statistics,  and  then  to  (Y3^,Y2,Y3)  where  Y-l=  (l-Zj^)  (I-Z2)  (I-Z3)  , 
^2~^2'  ^^*^  ^3"^  ^3*   '^°^  refer  to  Figures  3  and  4.   First, 
to  find  K,  the  restrictions  in  the  (Z-|^,Z2 ,  Z3) -space 
are  0<Zj^<Z2<Z3<l  and  (l-Z^^)  (I-Z2)  (1-Z3)=t.   In  the 
(Y^,Y2,Y3) -space  Zj^=0  and  (1-Z^)  (I-Z2)  (1-Z3)=t  is 

equivalent  to  (1-Y„)  (l-Y^)=t,Z  =Z^  and  (l-Z J  (1-Z^)  (1-Z^)=t 
^      -J      12  1      2      3 

is  equivalent  to  (I-Y2) ^ (1-Y3)=t ,  and  Z2=Z3  and 
(1-Z^)  (I-Z2)  {1-Z3)=t  is  equivalent  to  Y^=t.   In  this  case, 
K  will  be  an  integral  over  both  Y2  and  Y3  since  it  is 
desired  to  find  the  conditional  density,  conditioned 
on  Y-^.      From  Figure  4,  the  region  to  be  integrated  is 
the  singly  striped  region.   For  the  outside  integral, 
over  Y2,  the  lower  limit  is  evidently  zero  since  neither 
of  the  equations  (I-Y2) (1-Y3)=t  or  (I-Y2) ^ (1-Y3)=t 
imposes  a  lower  limit  on  Y2  alone.   Each  may  be  satisfied 
with  Y«=0.   The  upper  limit  occurs  when  Y^=Y-  and  is 
then  the  maximum  of  (1-Y2)^=t  and  (l-Y2)-^=t  or  at 
Y2=l-/t.   Thus,  the  integral  over  Y2  goes  from  0  to  l-/t, 
or  from  a  to  b  in  Figure  4.   For  the  integration  over  Y3 , 
Y2  may  be  considered  fixed.   Thus,  the  two  equations 
to  determine  the  limits  on  Y3,  (I-Y2) (1-Y3)=t  and 
(I-Y2)  (1-Y3)=t  can  be  solved  for  Y3  in  terms  of  Y2 , 
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(1)  (i-22);(i-Z3)-t 

(2)  (1-Z2)2(l-Z3)=t 
a-Z-^)   (1-Z2)^=t 


Figure  3 


(1)  (I-Y2)  (1-Y3)=t 


]Y2Y3=c 


(2)  (1-Y2)^(l-Y3)=t 


Figure  4 
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that  is,  Y3=l-(t/l-Y2)  and  ¥3=1- (t/ (I-Y2) ^) .   Since 
l-rt/Cl-Y2r>l-Ct/Cl-Y2)^)  ,  the  integral  on  Y3  would  go 
from  l-(t/(l-Y2)^)  ^°   l-(t/l-Y2).   There  is  one  slight 
complication,  however.   The  equation  ¥3=1- (t/ (l-Yj) ^) 
does  not  take  into  account  the  fact  that  Y2<Y_,  which 
must  be  the  case  since  22=Y2<Y2=Z2.   But,  this  may  be 
accounted  for  by  adjusting  the  lower  limit  to  be  the 
maximum  of  Y2  and  1- (t/ (I-Y2) ^) .   Hence, 


fl-/t  fl-(t/l-Y2) 
K  =  f  (Y3^,Y2,Y3)  dY3dY2  .  (4.44) 

maxlY2,l-(t/Cl-Y2)^)J 


To  determine  c  for  the  test,  the  further  restriction 
X^X2X3<c  is  put  on  the  problem.   In  the  (Y,Y,Y) -space, 
this  is  equivalent  to  [1- (t/ (1-Y  ) (1-Y  ) ) ] Y  Y  <c. 
In  Figure  4 ,  the  region  to  be  integrated  is  now  the 
doubly  striped  region.   This  puts  no  new  restriction  on 
Y2  alone,  but  does  restrict  the  lower  limit  of  Y3  for  a 
given  value  of  Y2 .   Thus,  the  lower  limit  of  integration 
on  Y3  will  change  from  that  used  to  find  K.   There  are 
now  three  conditions,  the  maximum  of  which  will  be  the 
lower  limit  of  Y3:  Y2=Y3,  1- (t/ (I-Y2 ) ^ )=Y3 ,  and 
[l-(t/(l-Y2)  (I-Y3) )]Y2Y3=c.   The  last  condition  is 

equivalent  to  (-b  +  ^b2-4ad  )/2a,   where  b=Y2  (1-c-t) -y|+c , 

2 

a=Y  -Y  ,  and  d=cY  -c.   Thus,  the  probability  statement. 
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Pr[X^X2X2<c|  (1-X^)  (l-X^)  (1-X2)=t]    =    .05 


(4.45) 


may  be  evaluated  by 


1-^ 


fl-(t/l-Y2) 


f  (yj^Y2Y3)dy3dy2 


(4.46) 


where 


Ft  =  max 


1- 


(I-Y2) 


2'  '2. 


/I 

-b  +  '^b  -4ad 

2d 


(4.47) 


where  a,  b,  and  d  are  given  above. 

The  problem  may  now  be  solved  for  a  sample  of  size 

n,  (X^,...,X^).   Again  transform  from  the  X.'s  to  the 

Z^'s  and  then  to  the  Y j^ '  s .   The  restrictions  in  the 

n 
(Zj^,  ...  ,Z^) -space  are  0<Z^<Z2<  .  .  .  <Zj^  and   n  (1-Zj^)=t. 

i=l 
In  the  (Y-j^, ...  ,Yj^) -space  the  restrictions  are 
n  9   ^ 

n  a-Yi)=t,  Cl-Y2)'^   n  (1-Yi)=t,  and  Y;L=t.   K  is  then 
i=2  i=3 

is  then  an  (n-1) -dimensional  integral.   Integrating 

Y2  as  the.  outside  integral,  its  limits  are  from  0  to 

the  maximum  value  of  Y2  for  the  two  equations  above. 

Since  Y2<Y3<...<Y^  both  equations  will  be  maximized 

for  Y2  when  y2=Y3=Y^=. . .=Y^.   Thus,  the  upper  limit 

of  Y2  will  be  the  maximum  of  l-'^/t  and  l-^~-^/t ,  or 

1-"^"  /t.   Taking  the  integral  over  Y^  next,  Y2  will  be 


Ill 


fixed.   The  lower  limit  will  be  Y  and  the  upper  limit 

will  be  the  maximum  value  of  Y-,  for  the  two  equations 

n 
with  Y2  fixed,  that  is,   n  (1-Y^)=t/1-Y2  and 

n  2     i=3 

n  (l-Y. )=t/ (1-Y„)  .   Again  the  maximum  occurs  when 

i=3    ^  _o  

Y3=Y4=...=Y^  or  at  Y2=l-^  ^/ETT^Y^.   The  limits  for 

Y^  through  Y   -,  will  be  obtained  similarly.   For  Y  , 

the  lower  limit  will  again  be  a  maximum  of  Y^^.-j^  and  the 

2   ^ 
solution  of  the  equation  (1-Y  )^   R  (1-Y.)=t  with 

^    i=3    ^ 

^2***^n-l  fi^®^*   Thus,  the  lower  limit  will  be  the 

2  "^"^ 
maximum  of  Y  _-,  and  l-t/(l-Y2)    IT  (1-Y-)  and  the  upper 

n-1  i=3 

limit  will  be  1-t/  n  (1-Y.). 

i=2 

Finally,  to  find  c  for  a  test,  one  further  restriction 

is  placed  on  the  lower  limit  of  the  integral  for  Y  . 

That  lower  limit  will  now  be 


max  lY^_-^,l- 


2   n-1 
(1-Y.)    n  (1-Y.) 
"^    i=3     ^ 


,  -b+'^b^-4ad 
2d 


(4.48) 


where 


n-1 


a=  n  Y.  ,  b=  t  n  L- 

i=2  ^      i=2  1-Yi 


n-1   Y.         n-1 

^   -  c  -   II  Y.  ,  and  d=c, 
i=2  ^ 


(4.49) 


n        n 
Thus,  Pr[   n  X.<c|   n  (1-X. )=t]  = 
i=l      i=l 


05  is  equivalent  to 
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n-1  ,-  ,  n-2 


n-1 


1-   /tfl-"  '/t/1-Yofl-t/  n  (1-Y.) 
^      i=2    ^ 


•  } 


f(YT,...,y„)dY„...dY, 


(4.50) 


where 


n-1 


/,  2 


F2  =  max{  Y^_i,l-t/(l-Y2)    n  (1-Y^)  ,  (-b+'^b'^-4ad)/2d  } 

i=3 

(4.51) 


with  a,  b,  and  d  defined  above.   Hence,  for  the  hypothesis 
^0'  °'-°'o  ^9^i^st  H^:  a<ctQ  with  B  unspecified,  the  test 


is  of  the  form 


(  n  X.) 
i=l  ^ 


1  if   n  Xj_  <_  c 

i=l 


0  if   n  X.  >  c 

i=l 


(4.52) 


where  c  is  found  through  the  conditional  distribution. 
A  similar  UMP  unbiased  test  could  be  developed  for  the 
hypothesis  Hq.-  a_<aQ  against  H-^^:  a>aQ,  with  B  unspecified. 
It  would  be  of  the  form 


(  n  X.)  =   \ 
i=l  ^ 


1  if   n  X.  >  c 

i=l   ~ 


0  if   n  Xj_  <  c 

i=l 


(4.53) 
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where  c  could  again  be  found  through  the  conditional 
distribution. 

Two-sided  Tests  for  a   Uhen  B  Is  Unknown 

A  UMP  unbiased  test  for  the  two-sided  hypothesis 

Hq:  ct=aQ  against  H-,  :  a4=aQ  could  be  developed.   Again, 

a  test  which  is  UMP  unbiased  for  the  conditional 

n 
problem,  conditioned  on  t_  =  IT  (1-X.),  will  be  UMP 

^    i=l    ^ 
unbiased  for  the  unconditional  problem  as  well. 

Ferguson  (1967)  gives  the  form  of  the  test. 


"^1 V  = 


1  if  t^<z^(.t^, .  .  .  ,t^)    or  t^>Z2  (t2,  .  .  .  ,tj^) 


y^{t^,... ,t^)    if  t^=z^(t2,. . . ,t^) ,  i=l,2 
0  if  z,  (t^, .  .  .  ,t,  )<t,<z„  (t-,  .  . .  ,t,  ) 


(4.54) 


where  the  conditions  to  find  z,  (t2 , . . . ,t,  )  and 
Z2  (t2, . .  .  ,tj^)  are 


E   (cl)(T)  I  T^,...  ,Ti^)  =  .05   and 

E  o(<t'(T)T-L  I  T2,...,T3^)  =  ctE  ^  (T^^  |  T^'-'-''^^^-  (4.55) 

®1  ®1 


Although  this  problem  has  not  been  developed  any  further, 
it  is  clear  that,  with  some  modification,  the  methods 
used  to  develop  one-sided  tests  could  be  used  here. 
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A  Test  for  the  Mean  of  the  Beta  Distribution 

In  theory  at  least,  it  is  possible  to  develop  a 
test  for  the  mean  of  the  beta  distribution.   Consider 
an  hypothesis  about  the  mean  such  as,  H^^ :  a/a+B  <    c 
against  H-j^:  a/a+B  >  c.   The  null  hypothesis  is  equivalent 
to  a  £  (a+B)c  or  a(l-c)-Bc  £  0.   This  is  an  hypothesis 
involving  a  linear  combination  of  the  parameters. 
Ferguson  (1967)  notes  that  the  general  theory  developed 
for  one-sided  and  two-sided  UMP  unbiased  tests  also 
applies  to  any  linear  combination  of  the  parameters. 
Consider 


L(X^,...,X^,a,B)  = 


r(a+B)  " 

_r(a)r(B)_ 


n  n   a-1 

n  X.   (1-X. ) 
i=l  ^ 


5-1 


r  r(a+B)  " 

Lr(a)r(B). 


exp{-t,-t2}  exp{at,+Bt2} 


r(a+B) 


r(a)r(B) 


exp{-t   -t    }    exp{ [a  (l-c)-Bc]       1 
•^      ^  1-c 


[t,+ 


2      i-c 


tj}     . 


(4.56) 


T  c 

Thus,  if  (T2^,T2)  is  transformed  to  (   1  ,  T2+  jzr^   T-j^) 

1-c 

then  the  parameter  associated  with  T-i/l-c  is  [a(l-c)-Bc], 

the  linear  combination  which  it  is  desired  to  test. 
Thus,  a  test  would  be  based  on  the  conditional  distri- 
bution of  Tt/1-c  given  T-,+  -^  T.  .  The  results  of  the 
l            2   i_c   1 
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previous  section  for  testing  a  with  6  unspecified  could 
be  used  to  completely  determine  the  test.   Its  general 
form  for  the  hypothesis  Hq :  a/a+B  £  c  against  H, :  a/a+6  >  c 
would  be 


(  ^1  )    =   { 


1  if  "^1   >  z(t-+ 


1-c 


2   1-c   1' 


0   if  ^1  <  2(t  +  -2_  t. )  . 
T=^      2   1-c   1 


(4.57) 
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CHAPTER  5 
ESTIMATORS  FOR  THE  DIRICHLET  DISTRIBUTION 

As  was  mentioned  in  Chapter  3,  the  solution  of  the 
likelihood  equations  for  the  maximum  likelihood  estimators 
of  the  parameters  of  the  Dirichlet  distribution  is 
similar  to  the  solution  of  the  likelihood  equations  for 
the  beta  distribution.   As  a  starting  point,  it  is 
again  necessary  to  have  moment  estimators  of  the 
parameters.   Consider  a  vector  X  =  (X, ,X2/X^)  which  has 
the  Dirichlet  distribution  with  parameters  a, ,  a2 , 
and  a_.   Expressing  X_  as  1-X,-X2,  the  density  may  be 
written 


T-  f~   +  ~   +  fv  )   a-i -1  oto-l         ^S"! 
f  (X.  ,X.,a.  ,a_,a_)  =  _^1___2___3_  X^    X^    (1-X  -X  )  . 
^      ^      ^      '^      ^  r(a^)r(a2)r(a3)  ^  ^  ^      ^ 

(5.1) 


To  find  the  moment  estimators  of  a-,  ,  a^,    and  a^  let 


n  n  ^  y^ 

M  =   Z  -^li  ,  M   =   Z  22i  /  M   =   E  ^li  (5.2) 

■•■    i=l  n     ^    i=l  n     ^        i=l  n 


and  equate  these  three  sample  moments  to  the  corresponding 
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population  moments .   The  population  moments  are 


ECX2)  =  a2/(a-|+a2+a3)  ,  and 

E(X^)  =  a3^(a3^+l)/(a2^+a2+a3)  (a2^+a2+a3+l)  .  (5.3! 


Equating  sample  moments  to  population  moments, 

^1  ^   o.-^/ iai+a2+OL2)  , 

Mj  =   a2/ (ct,+a2+a^)  ,    and 

"3   "   a^(aj^+l)/(a^+a2+a2)  (a^+a2+a2+l)  .  (5.4) 

These  three  equations  are  then  equivalent  to 

aj_(l-M^)  -  a2M^  "'^a^i  -   0' 

-a-,M2  +  02(1-1^2)  -  cxo^T  ~  ^'    ^^'^ 

a^CM^-M^)  -  a^^^    "^3^3  =  ^3"^^^!  *  (5.5) 

The  solution  of  these  three  simultaneous  equations 
yields  as  estimators  for  the  parameters 


Si  =  (M3-M3_)M^/(m2-M3)  , 

^2  =  (M3-Mj^)M2/(mJ-M3)  ,  and 
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a^  =  (M3-M^)  (l-M^-M2)/(Mj^-M3) 


(5.6) 


These,  then,  are  the  sample  moment  estimators  of  the 
three  parameters  of  the  distribution  of  X  =  (X-,X^,X_). 
If  X  had  been  a  vector  of  length  four  then  the  sample 
moment  estimators  would  have  been  the  solution  for 
a^^,  a2,  oto,  and  a^  of  four  simultaneous  equations. 

Using  these  sample  moment  estimators  as  the  initial 
estimators  in  an  iteration  process,  the  maximum  likelihood 
estimators  of  a  ,  a  ,  and  a  may  be  found  in  a  manner 
similar  to  the  solution  for  the  beta  distribution. 
With  the  density  as  given  in  Equation  5.1,  the  likelihood 
function  is  then 


L(X-|-i  ,.  .  .  ,Xi,X9-|  ,.  .  .  ,X^,a-i  ,ao,a-}) 


'11' 


'ln'^21 


r(a^   +   a^   +   ^3) 


yJa^TvTa^YvJa^j 


n   a,-l  a_-l  a^-1 

n  x,-^  X„T   (1-X,  .-X_.)  -^   , 
._,  li    2i       li   2i 


(5.7) 


Thus, 


In  L  =  n  Inr  (a-,+a2+a3)  -  n  InT  (a-,  )  -  n  lnr(a2) 


-  n  Inr (a  )  +   E  (a  -l)ln  X   +  (a2-l)ln  X2^ 


+  (a_-l)  ln(l-X. .-X^.) 
J  li   2i 


(5.8) 
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Now 


61n  L  =  n  m (a  +a^+a-)  -  n  ¥  (a, )  +   Z  In  x, •  =  0  , 
6aj_   •      i   ^   J         i    ^^3^    11 


51n  L  =  n  "¥  ioL  +a  +a    )    -   n  4'(a  )  +   Z  In  X- .  =  0 ,  and 
6a2         12   3         2     ^^^3^    2i 


61n  L  =  n  'I' (a  +a^+a^) -n  "^  {a)+   Z  In  (1-X,  . -X^  .  )  =  0, 
-^  12   3        3   i^i       li   2i 


(5.9) 


where   ^ (X)    =   51n   r(X)/oX.      Thus,    the   likelihood 
equations   are 


^'(a^^)    -   4' (a3^+a2+a2)    =      Z    In   X^^^/n  =  k^^    , 


¥(a2)    -   'i'(a3^+a2+a2)    =      Z    In   X2^/n  =  k2    ,    and 


4'CaJ    -   ¥(a   +a^+a,)    =      Z    ln(l-X     -X      )/n  =  k    .  (5.10) 

3  12      3  j^_-j^  li      2i  3 


Proceeding  with  the  Newton-Raphson  iteration  method  as 
was  done  for  the  beta  distribution,  let  a^  =  a   +h, 

^2  ~  °'20"^"'''  ^^*^  '^2    ~    °'30"^^'  ^'^^^^  ^'lO'  "^20'  ^^^  °'30  ^^^ 
the  initial  estimators  and  h,  1,  and  m  are  the  corrections. 


Then 


¥  (a-|^Q+h)  -¥  (a3^Q+a2Q+a3Q+h+l+m)  -k^ 


=   ^(otio)-^(o'io+'^20^°'30^-^l 


133 


+h 


5(¥(a^)-'i'(a^+a2+a2)-k^) 


6a- 


°'10'^20'^30 


+1 


6  {H'(a^)-W(a^+a2+a2)-k^) 


6a, 


l0'''20'"30 


+m 


6  ('i'(a    )-'V  (a   +a  +a    ) -k    ) 
1  12      3        1^ 


6a, 


°'l0'"20'^30    ' 


20  10      20      30  2 


'l'(a,„)-'l'(a,„+a^„+a^„)-k^ 
20  10      20      30         2 


+h 


6  {"V  {a^)-"^  {a^+a^+a^)-k^) 


6a, 


"l0'"20'°'30 


+1 


6(Y(a2)-^(a^+a2+a2)-k2) 


6a, 


°'l0'°'20'°'30 


+in 


5  {"H  (a2)-H'  (a3_+a2+a3)-k2) 


6a. 


"l0'°'20'"30    ' 


and 


¥(a  ^+111)-^  (a  +a  +a  +h+l+m)-k 
30       10   20   30         3 


^So^-^So-^°'20-^°'30^-'^3 
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Let 


+h 


&  {^  ia^)  -^  ia^+o.^+OL^)  -k^) 


5a, 


'10'"20'"30 


+1 


8{^  {a^)-"^  (a^+a^+a^)-k^) 


6a., 


°'10'"20'^30 


+m 


6  {"V  {a^)  -^  ia^+a^+a^)  -k^) 


6a. 


°'10'°'20'"30-    ^^'^^^ 


D  = 


6G  /6a,    6G/6a^ 
11      12 


5G2/6a^    662/602 


6G2/6a^    60^/602 


6G^/6a^ 


6G^/8a^ 


6G^/6a^ 


10   20   30 


where 


G^  =  ^(a^)  -  H' (0^+02+03)  -  k^  , 


G2  =  ^(02)  -  H*  (a2^+a2+a3)  -  k2  ,  and 


(5.12) 


G3  =  na^)    -   ^(a^+a2+a3)  -  k3 


(5.13) 
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Then 


^1  = 


^(«10+°'20+^30)+'^1-'^(«10) 


H'(a^Q+a2o+a3Q)+k2-4'(a2o) 


¥(a  +a   +a   )+k  -H*  (a   ) 
10   20   30    3     30 


6G^    6Gj 
TaZ       6a. 


2    2 

60-2         Soto 


6G3    6G3 
6a„   ToT 


^10'°'20'°'30' 


1  = 

1 


1   ^(a,^+a^„+a^„)+k  -^(a,^) 
-      10   20   30    1    10 


•^^2    ¥(a   +a   +a   )+k  -^  (a   ) 
J^  10   20   30    2    20 


6G. 
6a, 


^FCa   +a   +a   ) +k  -^(a   ) 
10   20   30    3     30 


6a- 


6G, 

6a, 


6G, 

6a, 


°'l0'''20'''30' 


and 


6G^ 

6a, 

«G, 

6a2 

^<"l0*°20^"30'"V*'"l0' 

""1  = 

'"2 

*=2 
«"2 

^So*"20*"3o'*V*So' 

6g 
3 

«G 
3 

*"2 

*<°10-'°20-'"30'-''^3--<"30> 

°'10'"20''^30' 
(5.14) 
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The  initial  estimators  are  then  corrected  to  obtain 
a  new  set  of  initial  estimators,  and  the  process  is 
repeated  until  it  converges  to  a  solution  for  the 
maximum  likelihood  estimators  of  a,,  a^ ,  and  a^. 
As  with  the  sample  moment  estimators,  if  X  =  (X-,  ,X2,X3,X^)  , 
a  vector  of  length  four,  there  would  be  four  equations 
to  solve  and  iterate  to  a  set  of  solutions.   In  general 
there  would  be  k  equations  for  X  =  (X, ,...,X,  ).   It  is 
quite  evident  that  the  solution  of  these  equations  would 
be  very  cumbersome  if  k  is  large.   Hence,  unless  there  is 
some  compelling  need  for  the  maximum  likelihood  estimators, 
one  would  probably  have  to  be  satisfied  with  the  sample 
moment  estimators. 
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